The detection of the contaminants in daily food and drinking water is crucial for global public health. For heavy metals detection of Mercury (Hg) and Arsenic (As), our group has proposed a novel paper-based and microfluidic device integrated with a mobile phone and an image analysis pipeline to capture and analyze the sensor images on-site. Still, the detection of lower contamination levels remains challenging due to the small number of available data samples and large intra-class variance of our application. To overcome this challenge, we explore traditional data augmentation and GAN-based augmentation techniques for synthesizing realistic colorimetric images; and we propose a CNN classifier for five-contamination-levels classification. Our proposed system is trained and evaluated on a limited dataset of 126 phone captured images of five contamination levels. Our system yields 88.1% classification accuracy and 91.92% precision, demonstrating the feasibility of this approach. We believe that this approach of training deep learning models on limited detection images datasets presents a clear path toward phone-based contamination-levels detection.
We studied the modern deep convolutional neural networks used for image denoising, where RGB input images are transformed into RGB output images via feed-forward convolutional neural networks that use a loss defined in the RGB color space. Considering the difference between human visual perception and objective evaluation metrics such as PSNR or SSIM, we propose a data augmentation technique and demonstrate that it is equivalent to defining a perceptual loss function. We trained a network based on this and obtained visually pleasing denoised results. We also combine an unsupervised design and the bias-free network to deal with the overfitting due to the absence of clean images, and improve performance when the noise level exceeds the training range.
The Magdalena Ridge Observatory Interferometer (MROI) utilizes Shack-Hartmann Wavefront Sensing (SH-WFS) for the back-end stability of its beam relay systems in a unique design. The SH-WFS, however, is sensitive to atmospheric turbulence scintillation which can drastically affect its precision in calculating the position of the beam profile it sees. A large number of images are needed to counteract the turbulence effect. Here we use deep learning as an alternative to long averaging cycles. A CNN was trained to map from a number of initial images of a series of star frames to the average image of the entire series at different positions of the beam profile. Under typical seeing conditions expected at MROI, the results showed that the network can map 10 input frames to the average of 100 within the permissible error margin of 0.1 pixels and furnish proper generalization to beam position movements not seen during training. The network can also outperform the averaging technique when both techniques operate on small numbers of input frames such as 10 or 20.
Deep neural networks have been utilized in an increasing number of computer vision tasks, demonstrating superior performance. Much research has been focused on making deep networks more suitable for efficient hardware implementation, for low-power and low-latency real-time applications. In [1], Isikdogan et al. introduced a deep neural network design that provides an effective trade-off between flexibility and hardware efficiency. The proposed solution consists of fixed-topology hardware blocks, with partially frozen/partially trainable weights, that can be configured into a full network. Initial results in a few computer vision tasks were presented in [1]. In this paper, we further evaluate this network design by applying it to several additional computer vision use cases and comparing it to other hardware-friendly networks. The experimental results presented here show that the proposed semi-fixed semi-frozen design achieves competitive performanc on a variety of benchmarks, while maintaining very high hardware efficiency.
Assessing the quality of images is a challenging task. To achieve this goal, the images must be evaluated by a pool of subjects following a well-defined assessment protocol or an objective quality metric must be defined. In this contribution, an objective metric based on neural networks is proposed. The model takes into account the human vision system by computing a saliency map of the image under test. The system is based on two modules: the first one is trained using normalized distorted images. It learns the features from the original and the distorted images and the estimated saliency map. Furthermore, an estimate of the prediction error is performed. The second module (non-linear regression module) is trained with the available subjective scores. The performances of the proposed metric have been evaluated by using state of the art quality assessment datasets. The achieved results show the effectiveness of the proposed system in matching the subjective quality score.
In this paper, we propose a novel system for remotely estimating the respiration rate of people. Periodic inhalation and exhalation during respiration cycles induce subtle upper body movements, which are reflected by the local image deformation over time when recorded by a digital camera. This local image deformation can be recovered by estimating the optical flow between consecutive frames. We propose the usage of convolutional neural networks designed for general image registration to estimate the induced optical flow, the periodicity of which is then leveraged to obtain the respiration rate by frequency analysis. The proposed system is robust to lighting condition, camera type (RGB, infrared), clothing, and posture (sitting in chair/lying in bed); and it could be used by individuals with a webcam, or by healthcare centers to monitor the patients at night.
Overweight vehicles are a common source of pavement and bridge damage. Especially mobile crane vehicles are often beyond legal per-axle weight limits, carrying their lifting blocks and ballast on the vehicle instead of on a separate trailer. To prevent road deterioration, the detection of overweight cranes is desirable for law enforcement. As the source of crane weight is visible, we propose a camera-based detection system based on convolutional neural networks. We iteratively label our dataset to vastly reduce labeling and extensively investigate the impact of image resolution, network depth and dataset size to choose optimal parameters during iterative labeling. We show that iterative labeling with intelligently chosen image resolutions and network depths can vastly improve (up to 70×) the speed at which data can be labeled, to train classification systems for practical surveillance applications. The experiments provide an estimate of the optimal amount of data required to train an effective classification system, which is valuable for classification problems in general. The proposed system achieves an AUC score of 0.985 for distinguishing cranes from other vehicles and an AUC of 0.92 and 0.77 on lifting block and ballast classification, respectively. The proposed classification system enables effective road monitoring for semi-automatic law enforcement and is attractive for rare-class extraction in general surveillance classification problems.
Bidirectional Texture Function (BTF) is one of the methods to reproduce realistic images in Computer Graphics (CG). This is a technique that can be applied to texture mapping with changing lighting and viewing directions and can reproduce realistic appearance by a simple and high-speed processing. However, in the BTF method, a large amount of texture data is generally measured and stored in advance. In this paper, in order to address the problems related to the measurement time and the texture data size in the BTF reproduction, we a method to generate a BTF image dataset using deep learning. We recovery texture images under various azimuth lighting conditions from a single texture image. For achieving this goal, we applied the U-Net to our BTF recovery. The restored and original texture images are compared using SSIM. It will be confirmed that the reproducibility of fabric and wood textures is high.
Considering the complexity of a multimedia society and the subjective task of describing images with words, a visual search application is a valuable tool. This work implements a Content-Based Image Retrieval (CBIR) application for texture images with the goal of comparing three deep convolutional neural networks (VGG-16, ResNet-50, and DenseNet-161), used as image descriptors by extracting global features from images. For measuring similarity among images and ranking them, we employed cosine similarity, Manhattan distance, Bray-Curtis dissimilarity, and Canberra distance. We confirm that global average pooling applied to convolutional layers provides good texture descriptors, and propose to use it when extracting features from VGGbased models. Our best result uses the average pooling layer from DenseNet-161 as a 2208-dim feature vector along with Bray-Curtis dissimilarity. We achieved 73:09% mAP@1 and 76:98% mAP@5 on the Describable Textures Dataset (DTD) benchmark, adapted for image retrieval. Our mAP@1 result is comparable to the state-of-the-art classification accuracy (73:8%). We also investigate the impact on retrieval performance when reducing the number of feature components with PCA. We are able to compress a 2208-dim descriptor down to 128 components with a moderate 3.3 percentage points drop in mAP@1.
This paper presents a new method for no reference mesh visual quality assessment using a convolutional neural network. To do this, we first render 2D images from multiple views of the 3D mesh. Then, each image is split into small patches which are learned to a convolutional neural network. The network consists of two convolutional layers with two max-pooling layers. Then, a multilayer perceptron (MLP) with two fully connected layers is integrated to summarize the learned representation into an output node. With this network structure, feature learning and regression are used to predict the quality score of a given distorted mesh without the availability of the reference mesh. Experiments have been successfully conducted on LIRIS/EPFL generalpurpose database. The obtained results show that the proposed method provides good correlation and competitive scores comparing to some influential and effective full and reduced reference methods.