Recently, many deep learning applications have been used on the mobile platform. To deploy them in the mobile platform, the networks should be quantized. The quantization of computer vision networks has been studied well but there have been few studies for the quantization of image restoration networks. In previous study, we studied the effect of the quantization of activations and weight for deep learning network on image quality following previous study for weight quantization for deep learning network. In this paper, we made adaptive bit-depth control of input patch while maintaining the image quality similar to the floating point network to achieve more quantization bit reduction than previous work. Bit depth is controlled adaptive to the maximum pixel value of the input data block. It can preserve the linearity of the value in the block data so that the deep neural network doesn't need to be trained by the data distribution change. With proposed method we could achieve 5 percent reduction in hardware area and power consumption for our custom deep network hardware while maintaining the image quality in subejctive and objective measurment. It is very important achievement for mobile platform hardware.
A lightweight learning-based exposure bracketing strategy is proposed in this paper for high dynamic range (HDR) imaging without access to camera RAW. Some low-cost, power-efficient cameras, such as webcams, video surveillance cameras, sport cameras, mid-tier cellphone cameras, and navigation cameras on robots, can only provide access to 8-bit low dynamic range (LDR) images. Exposure fusion is a classical approach to capture HDR scenes by fusing images taken with different exposures into a 8-bit tone-mapped HDR image. A key question is what the optimal set of exposure settings are to cover the scene dynamic range and achieve a desirable tone. The proposed lightweight neural network predicts these exposure settings for a 3-shot exposure bracketing, given the input irradiance information from 1) the histograms of an auto-exposure LDR preview image, and 2) the maximum and minimum levels of the scene irradiance. Without the processing of the preview image streams, and the circuitous route of first estimating the scene HDR irradiance and then tone-mapping to 8-bit images, the proposed method gives a more practical HDR enhancement for real-time and on-device applications. Experiments on a number of challenging images reveal the advantages of our method in comparison with other state-of-the-art methods qualitatively and quantitatively.
Open-source technologies (OSINT) are becoming increasingly popular with investigative and government agencies, intelligence services, media companies, and corporations [22]. These OSINT technologies use sophisticated techniques and special tools to analyze the continually growing sources of information efficiently [17]. There is a great need for professional training and further education in this field worldwide. After having already presented the overall structure of a professional training concept in this field in a previous paper [25], this series of articles offers individual further training modules for the worldwide standard state-of-the-art OSINT tools. The modules presented here are suitable for a professional training program and an OSINT course in a bachelor’s or master’s computer science or cybersecurity study at a university. In part 1 of a series of 4 articles, the OSINT tool RiskIQ Passiv-Total [26] is introduced, and its application possibilities are explained using concrete examples. In part 2 the OSINT tool Censys is explained [27]. This part 3 deals with Maltego [28] and Part 4 compares the 3 different tools of Part 1-3 [29].
Due to the use of 3D contents in various applications, Stereo Image Quality Assessment (SIQA) has attracted more attention to ensure good viewing experience for the users. Several methods have been thus proposed in the literature with a clear improvement for deep learning-based methods. This paper introduces a new deep learning-based no-reference SIQA using cyclopean view hypothesis and human visual attention. First, the cyclopean image is built considering the presence of binocular rivalry that covers the asymmetric distortion case. Second, the saliency map is computed taking into account the depth information. The latter aims to extract patches on the most perceptual relevant regions. Finally, a modified version of the pre-trained vgg-19 is fine-tuned and used to predict the quality score through the selected patches. The performance of the proposed metric has been evaluated on 3D LIVE phase I and phase II databases. Compared with the state-of-the-art metrics, our method gives better outcomes.
Activity recognition and pose estimation are ingeneral closely related in practical applications, even though they are considered to be independent tasks. In this paper, we propose an artificial 3D coordinates and CNN that is for combining activity recognition and pose estimation with 2D and 3D static/dynamic images(dynamic images are composed of a set of video frames). In other words, We show that the proposed algorithm can be used to solve two problems, activity recognition and pose estimation. End-to-end optimization process has shown that the proposed approach is superior to the one which exploits the activity recognition and pose estimation seperately. The performance is evaluated by calculating recognition rate. The proposed approach enable us to perform learning procedures using different datasets.
Recently, stereo cameras have been widely packed in smart phones and autonomous vehicles thanks to low cost and smallsized packages. Nevertheless, acquiring high resolution (HR) stereo images is still a challenging problem. While the traditional stereo image processing tasks have mainly focused on stereo matching, stereo super-resolution (SR) has drawn less attention which is necessitated for HR images. Some deep learning based stereo image SR works have recently shown promising results. However, they have not fully exploited binocular parallax in SR, which may lead to unrealistic visual perception. In this paper, we present a novel and computationally efficient convolutional neural network (CNN) based deep SR network for stereo images by learning parallax coherency between the left and right SR images, which is called ProPaCoL-Net. The proposed ProPaCoL-Net progressively learns parallax coherency via a novel recursive parallax coherency (RPC) module with shared parameters. The RPC module is effectively designed to extract parallax information in prior for the left image SR from its right view input images and vice versa. Furthermore, we propose a parallax coherency loss to reliably train the ProPaCoL-Net. From extensive experiments, the ProPaCoL-Net shows to outperform the very recent state-of-the-art method with average 1.15 dB higher in PSNR.
Video Quality Assessment (VQA) is an essential topic in several industries ranging from video streaming to camera manufacturing. In this paper, we present a novel method for No-Reference VQA. This framework is fast and does not require the extraction of hand-crafted features. We extracted convolutional features of 3-D C3D Convolutional Neural Network and feed one trained Support Vector Regressor to obtain a VQA score. We did certain transformations to different color spaces to generate better discriminant deep features. We extracted features from several layers, with and without overlap, finding the best configuration to improve the VQA score. We tested the proposed approach in LIVE-Qualcomm dataset. We extensively evaluated the perceptual quality prediction model, obtaining one final Pearson correlation of 0:7749±0:0884 with Mean Opinion Scores, and showed that it can achieve good video quality prediction, outperforming other state-of-the-art VQA leading models.
Forensics research has developed several techniques to identify the model and manufacturer of a digital image or videos source camera. However, to the best of our knowledge, no work has been performed to identify the manufacturer and model of the scanner that captured an MRI image. MRI source identification can have several important applications ranging from scientific fraud discovery, exposing issues around anonymity and privacy of medical records, protecting against malicious tampering of medical images, and validating AI-based diagnostic techniques whose performance varies on different MRI scanners. In this paper, we propose a new CNN-based approach to learn forensic traces left by an MRI scanner and use these traces to identify the manufacturer and model of the scanner that captured an MRI image. Additionally, we identify an issue called weight divergence that can occur when training CNNs using a constrained convolutional layer and propose three new correction functions to protect against this. Our experimental results show we can identify an MRI scanners manufacturer with 97.88% accuracy and its model with 91.07% accuracy. Additionally, we show that our proposed correction functions can noticeably improve our CNNs accuracy when performing scanner model identification.
Road traffic signs provide vital information about the traffic rules, road conditions, and route directions to assist drivers in safe driving. Recognition of traffic signs is one of the key features of Advanced Driver Assistance Systems (ADAS). In this paper, we present a Convolutional Neural Network (CNN) based approach for robust Traffic Sign Recognition (TSR) that can run real-time on low power embedded systems. To achieve this, we propose a twostage network: In the first stage, a generic traffic sign detection network localizes the position of traffic signs in the video footage, and in the second stage a country-specific classification network classifies the detected signs. The network sub-blocks were retrained to generate an optimal network that runs real-time on the Nvidia Tegra platform. The network?s computational complexity and the model size are further reduced to make it deployable on low power embedded platforms. Methods like network customization, weight pruning, and quantization schemes were used to achieve an 8X reduction in computation complexity. The pruned and optimized network is further ported and benchmarked on embedded platforms like Texas Instruments Jacinto TDA2x SoC and Qualcomm?s Snapdragon 820Automotive platform.
Change detection from ground vehicles has various applications, such as the detection of roadside Improvised Explosive Devices (IEDs). Although IEDs are hidden, they are often accompanied by visible markers, which can be any kind of object. Because of this, any suspicious change in the environment compared to an earlier moment in time, should be detected. Little work has been published to solve this ill-posed problem using deep learning. This paper shows the feasibility of applying convolutional neural networks (CNNs) to HD video, to accurately predict the presence and location of such markers in real time. The network is trained for the detection of pixel-level changes in HD video, compared to an earlier reference recording. We investigate Siamese CNNs in combination with an encoder-decoder architecture and introduce a modified double-margin contrastive loss function, to achieve pixel-level change detection results. Our dataset consists of seven pairs of challenging real-world recordings with geo-tagged test objects. The proposed network architecture is capable of comparing two images of 1920×1440 pixels in 150 ms on a GTX1080Ti GPU. The proposed network significantly outperforms state-of-the-art networks and algorithms on our dataset in terms of F-1 score, on average by 0.28.