At public space such as a zoo and sports facilities, the presence of fence often annoys tourists and professional photographers. There is a demand for a post-processing tool to produce a non-occluded view from an image or video. This “de-fencing” task is divided into two stages: one is to detect fence regions and the other is to fill the missing part. For a decade or more, various methods have been proposed for video-based de-fencing. However, only a few single-image-based methods are proposed. In this paper, we mainly focus on single-image fence removal. Conventional approaches suffer from inaccurate and non-robust fence detection and inpainting due to less content information. To solve these problems, we combine novel methods based on a deep convolutional neural network (CNN) and classical domain knowledge in image processing. In the training process, we are required to obtain both fence images and corresponding non-fence ground truth images. Therefore, we synthesize natural fence image from real images. Moreover, spacial filtering processing (e.g. a Laplacian filter and a Gaussian filter) improves the performance of the CNN for detecting and inpainting. Our proposed method can automatically detect a fence and generate a clean image without any user input. Experimental results demonstrate that our method is effective for a broad range of fence images.
The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.
Classification of degraded images is very important in practice because images are usually degraded by compression, noise, blurring, etc. Nevertheless, most of the research in image classification only focuses on clean images without any degradation. Some papers have already proposed deep convolutional neural networks composed of an image restoration network and a classification network to classify degraded images. This paper proposes an alternative approach in which we use a degraded image and an additional degradation parameter for classification. The proposed classification network has two inputs which are the degraded image and the degradation parameter. The estimation network of degradation parameters is also incorporated if degradation parameters of degraded images are unknown. The experimental results showed that the proposed method outperforms a straightforward approach where the classification network is trained with degraded images only.
Many archival photos are unique, existed only in a single copy. Some of them are damaged due to improper archiving (e.g. affected by direct sunlight, humidity, insects, etc.) or have physical damage resulting in the appearance of cracks, scratches on photographs, non-necessary signs, spots, dust, and so on. This paper proposed a system for detection and removing image defects based on machine learning. The method for detecting damage to an image consists of two main steps: the first step is to use morphological filtering as a pre-processing, the second step is to use the machine learning method, which is necessary to classify pixels that have received a massive response in the preprocessing phase. The second part of the proposed method is based on the use of the adversarial convolutional neural network for the reconstruction of damages detected at the previous stage. The effectiveness of the proposed method in comparison with traditional methods of defects detection and removal was confirmed experimentally.
This paper presents a new method for segmenting medical images is based on Hamiltonian quaternions and the associative algebra, method of the active contour model and LPA-ICI (local polynomial approximation - the intersection of confidence intervals) anisotropic gradient. Since for segmentation tasks, the image is usually converted to grayscale, this leads to the loss of important information about color, saturation, and other important information associated color. To solve this problem, we use the quaternion framework to represent a color image to consider all three channels simultaneously when segmenting the RGB image. As a method of noise reduction, adaptive filtering based on local polynomial estimates using the ICI rule is used. The presented new approach allows obtaining clearer and more detailed boundaries of objects of interest. The experiments performed on real medical images (Z-line detection) show that our segmentation method of more efficient compared with the current state-of-art methods.
Advanced methodologies for transmitting compressed images, within acceptable ranges of transmission rate and loss of information, make it possible to transmit a medical image through a communication channel. Most prior works on 3D medical image compression consider volumetric images as a whole but fail to account for the spatial and temporal coherence of adjacent slices. In this paper, we set out to develop a 3D medical image compression method that extends the 3D wavelet difference reduction algorithm by computing the similarity of the pixels in adjacent slices and progressively compress only the similar slices. The proposed method achieves high-efficiency performance on publicly available datasets of MRI scans by achieving compression down to one bit per voxel with PSNR and SSIM up to 52:3 dB and 0:7578, respectively.
In this paper, we propose a patch-based system to classify non-small cell lung cancer (NSCLC) diagnostic whole slide images (WSIs) into two major histopathological subtypes: adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC). Classifying patients accurately is important for prognosis and therapy decisions. The proposed system was trained and tested on 876 subtyped NSCLC gigapixel-resolution diagnostic WSIs from 805 patients – 664 in the training set and 141 in the test set. The algorithm has modules for: 1) auto-generated tumor/non-tumor masking using a trained residual neural network (ResNet34), 2) cell-density map generation (based on color deconvolution, local drain segmentation, and watershed transformation), 3) patch-level feature extraction using a pre-trained ResNet34, 4) a tower of linear SVMs for different cell ranges, and 5) a majority voting module for aggregating subtype predictions in unseen testing WSIs. The proposed system was trained and tested on several WSI magnifications ranging from x4 to x40 with a best ROC AUC of 0.95 and an accuracy of 0.86 in test samples. This fully-automated histopathology subtyping method outperforms similar published state-of-the-art methods for diagnostic WSIs.
In this paper, we propose a new fire monitoring system that automatically detect fire flames in night-time using a CCD camera. The proposed system consists of two cascading steps to reliably detect fire regions. First, ELASTIC-YOLOv3 is proposed to better detect a small fires. The main role of ELASTIC-YOLOv3 is to find fire candidate regions in images as the first step. The candidate fire regions are passed to the second verification step to detect more reliable fire region results. The second step takes into account the dynamic characteristic of the fire. To do this, we construct fire-tubes by connecting the fire candidate regions detected in several frames, and extract the histogram of optical flow (HOF) from the fire-tube. However, because the extracted HOF feature vector has a considerably large size, the feature vector is reduced by applying a predefined bag of feature (BOF) and then applied to the fast random forest classifier to verify the final fire regions instead of heavy recurrent neural network (RNN). The proposed method has been experimentally shown a faster processing time and higher fire detection accuracy with lower missing and false alarm.
In this paper, we investigate person re-identification (re-ID) in a multi-camera network for surveillance applications. To this end, we create a Spatio-Temporal Multi-Camera model (ST-MC model), which exploits statistical data on a person’s entry/exit points in the multi-camera network, to predict in which camera view a person will re-appear. The created ST-MC model is used as a novel extension to the Multiple Granularity Network (MGN) [1], which is the current state of the art in person re-ID. Compared to existing approaches that are solely based on Convolutional Neural Networks (CNNs), our approach helps to improve the re-ID performance by considering not only appearance-based features of a person from a CNN, but also contextual information. The latter serves as scene understanding information complimentary to person re-ID. Experimental results show that for the DukeMTMC-reID dataset [2][3], introduction of our ST-MC model substantially increases the mean Average Precision (mAP) and Rank-1 score from 77.2% to 84.1%, and from 88.6% to 96.2%, respectively.