Recently, many works have proposed to fuse radar data as an additional perceptual signal into monocular depth estimation models because radar data is robust against various light and weather conditions. Although positive results were reported in prior works, it is still hard to tell how much depth information radar can contribute to a depth estimation model. In this paper, we propose radar inference and supervision experiments to investigate the intrinsic depth capability of radar data using state-of-the-art depth estimation models on the nuScenes dataset. In the inference experiment, the model predicts depth by taking only radar as input to demonstrate the inference capability of radar data. In the supervision experiment, a monocular depth estimation model is trained under radar supervision to show the intrinsic depth information that radar can contribute. Our experiments demonstrate that the model with only sparse radar input can detect the shape of surroundings to a certain extent in the predicted depth. Furthermore, the monocular depth estimation model supervised by preprocessed radar achieves a good performance compared to the baseline model trained with sparse lidar supervision.
In this paper we investigate applying two deep generative models to digital halftoning with the aim of generating halftones with comparable quality to those generated with the direct binary search (DBS) algorithm. For the first framework, we apply conditional generative adversarial networks (cGANs) using two discriminators with different receptive field size and a generator consisting of densely connected blocks. For the second framework, deep autoregressive (AR) models, we propose mapping input images into a feature space using a single forward pass of a deep neural network and then applying a shallow autoregressive model at the end output. Our methods show promising results; halftones generated with our algorithms are less noisy than those generated with DBS screen and do not contain artifacts commonly associated with error diffusion type algorithms.
This paper presents an effective tuning framework between CMOS Image Sensor (CIS) and Image Signal Processor (ISP) based on user preference feedback. One of key issue in ISP tuning is how to apply individual's subjectivity of Image Quality (IQ) in systematic way. In order to mitigate this issue, we propose a framework that efficiently surveys user preference of IQ and select ISP parameter based on those preferences. The overall processes are done on large-scale image database generated by an ISP simulator. In preference survey part, we make clusters that consist of perceptually similar images and gather user’s feedback on representative images of each cluster. Next, for training user preference, we train a DNN model according to general preference, and fine-tune model to optimize individuals preference based on user feedback. The model provides ISP candidate most similar to the preferences. In order to assess performance, the proposed framework was evaluated with a state-of-art CIS and ISP system. The experimental results indicate that the proposed framework converges the IQ score according to user feedback and find the ISP parameters that have higher quality IQ as compared with hand-tuned results.
In this paper, we propose a method for automatically estimating three typical human-impression factors, "hard-soft", "flashy-sober", and "stable-unstable" which are obtained from objects by analyzing their three-dimensional shapes. By realizing this method, a designer's will in directly shaping an object can be reflected during the design process. Here, the focus is highly correlating human impressions to the three-dimensional shape representation of objects. Previous work includes a method for estimating human impressions by using specially designed features and linear classifiers. However, it can be used for only the "hard-soft" impression factor because the feature has been optimized for this impression. The performance of this method is relatively low, and its processing time is low. In addition to, there is a serious problem in which this method can be used for only a particular impression factor. The purpose of this research is to propose a new method that can apply to all three typical impression factors mentioned above. First, we use a single RGB image that was acquired from a specific view direction instead of general three-dimensional mesh data from the range finder. This enables a very simple system consisting of a single camera. Second, we use a deep neural network as a nonlinear classifier. For our experiment, a lot of learning sample images with numerical human-impression factors were used. As for annotating correct impression factors as ground-truths, we utilized the SD (semantic differential) method, which is very popular in the field of psychological statistics. We have shown that the success rate of the proposed method is 83% for "hard-sofi", 78% for "flashy-sober", and 80% for "stable-unstable" when using test images that are not included in the learning data.