Solid-state lidar cameras produce 3D images, useful in applications such as robotics and self-driving vehicles. However, range is limited by the lidar laser power and features such as perpendicular surfaces and dark objects pose difficulties. We propose the use of intensity images, inherent in lidar camera data from the total laser and ambient light collected in each pixel, to extract additional depth information and boost ranging performance. Using a pair of off-the-shelf lidar cameras and a conventional stereo depth algorithm to process the intensity images, we demonstrate increase of the native lidar maximum depth range by 2× in an indoor environment and almost 10× outdoors. Depth information is also extracted from features in the environment such as dark objects, floors and ceiling which are otherwise not detected by the lidar sensor. While the specific technique presented is useful in applications involving multiple lidar cameras, the principle of extracting depth data from lidar camera intensity images could also be extended to standalone lidar cameras using monocular depth techniques.
In recent years, several deep learning-based architectures have been proposed to compress Light Field (LF) images as pseudo video sequences. However, most of these techniques employ conventional compression-focused networks. In this paper, we introduce a version of a previously designed deep learning video compression network, adapted and optimized specifically for LF image compression. We enhance this network by incorporating an in-loop filtering block, along with additional adjustments and fine-tuning. By treating LF images as pseudo video sequences and deploying our adapted network, we manage to address challenges presented by the unique features of LF images, such as high resolution and large data sizes. Our method compresses these images competently, preserving their quality and unique characteristics. With the thorough fine-tuning and inclusion of the in-loop filtering network, our approach shows improved performance in terms of Peak Signal-to-Noise Ratio (PSNR) and Mean Structural Similarity Index Measure (MSSIM) when compared to other existing techniques. Our method provides a feasible path for LF image compression and may contribute to the emergence of new applications and advancements in this field.
Neural Radiance Fields (NeRF) have attracted particular attention due to their exceptional capability in virtual view generation from a sparse set of input images. However, their scope is constrained by the substantial amount of images required for training. This work introduces a data augmentation methodology to train NeRF using external depth information. The approach entails generating new virtual images at different positions through the utilization of MPEG's reference view synthesizer (RVS) to augment the training image pool for NeRF. Results demonstrate a substantial enhancement in the output quality when employing the generated views in comparison to a scenario where they are omitted.
Recent advancements in 3D data capture have enabled the real-time acquisition of high-resolution 3D range data, even in mobile devices. However, this type of high bit-depth data remains difficult to efficiently transmit over a standard broadband connection. The most successful techniques for tackling this data problem thus far have been image-based depth encoding schemes that leverage modern image and video codecs. To our knowledge, no published work has directly optimized the end-to-end losses of a depth encoding scheme passing through a lossy image compression codec. In contrast, our compression-resilient neural depth encoding method leverages deep learning to efficiently encode depth maps into 24-bit RGB representations that minimize end-to-end depth reconstruction errors when compressed with JPEG. Our approach employs a fully differentiable pipeline, including a differentiable approximation of JPEG, allowing it to be trained end-to-end on the FlyingThings3D dataset with randomized JPEG qualities. On a Microsoft Azure Kinect depth recording, the neural depth encoding method was able to significantly outperform an existing state-of-the-art depth encoding method in terms of both root-mean-square error (RMSE) and mean absolute error (MAE) in a wide range of image qualities, all with over 20% lower average file sizes. Our method offers an efficient solution for emerging 3D streaming and 3D telepresence applications, enabling high-quality 3D depth data storage and transmission.
Quantification of the image sensor signal and noise is essential to derive key image quality performance indicators. Image sensors in automotive cameras are predominately activated in high dynamic range (HDR) mode, however, legacy procedures to quantify image sensor noise were optimized for operation in standard dynamic range mode. This work discusses the theoretical background and the workflow of the photon-transfer curve (PTC) test. Afterwards, it presents example implementations of the PTC test and its derivatives according to legacy procedures and according to procedures that were optimized for image sensors in HDR mode.
Noise Equivalent Quanta (NEQ) is an objective Fourier metric which evaluates the performance of an imaging system by detailing the effective equivalent quanta of an exposure versus spatial frequency. Calculated via the modulation transfer function (MTF) and noise power spectrum (NPS), it is a valuable precursor for ranking the detection capabilities of systems and a fundamental metric that combines sharpness and noise performance of an imaging system into a single curve in a physically meaningful way. The dead leaves measurement technique is able to provide an estimate of the MTF and NPS of an imaging system using a single target, and therefore a potentially convenient method for the assessment of NEQ. This work validates the use of the dead leaves technique to measure NEQ, firstly through simulation of an imaging system with known MTF and NPS, then via measurement of camera systems, both in the RAW domain and post-ISP. The dead leaves approach is shown to be a highly effective and practical method to estimate NEQ, ranking imaging systems performance both pre- and post-ISP.
The Modulation Transfer Function (MTF) is an important image quality metric typically used in the automotive domain. However, despite the fact that optical quality has an impact on the performance of computer vision in vehicle automation, for many public datasets, this metric is unknown. Additionally, wide field-of-view (FOV) cameras have become increasingly popular, particularly for low-speed vehicle automation applications. To investigate image quality in datasets, this paper proposes an adaptation of the Natural Scenes Spatial Frequency Response (NS-SFR) algorithm to suit cameras with a wide field-of-view.
Vehicle-borne cameras vary greatly in imaging properties, e.g., angle of view, working distance and pixel count, to meet the diverse requirements of various applications. In addition, auto parts must tolerate dramatic variations in ambient temperature. These pose considerable challenges to the automotive industry when it comes to the evaluation of automotive cameras in terms of imaging performance. In this paper, an integrated and fully automated system, developed specifically to address these issues, is described. The key components include a collimator unit incorporating a LED light source and a transmissive test target, a mechanical structure that holds and moves the collimator and the camera under test, and a software suite that communicates with the controllers and computes the images captured by the camera. With the multifunctional system, imaging performance of cameras can be conveniently measured at a high degree of accuracy, precision and compatibility. The results are consistent with those obtained from tests conducted with conventional methods. Preliminary results demonstrate the potential of the system in terms of functionality and flexibility with continuing development.
Multi-modal pedestrian detection has been developed actively in the research field for the past few years. Multi-modal pedestrian detection with visible and thermal modalities outperforms visible-modal pedestrian detection by improving robustness to lighting effects and cluttered backgrounds because it can simultaneously use complementary information from visible and thermal frames. However, many existing multi-modal pedestrian detection algorithms assume that image pairs are perfectly aligned across those modalities. The existing methods often degrade the detection performance due to misalignment. This paper proposes a multi-modal pedestrian detection network for a one-stage detector enhanced by a dual-regressor and a new algorithm for learning multi-modal data, so-called object-based training. This study focuses on Single Shot MultiBox Detector (SSD), one of the most common one-stage detectors. Experiments demonstrate that the proposed method outperforms current state-of-the-art methods on artificial data with large misalignment and is comparable or superior to existing methods on existing aligned datasets.
Naturalistic driving studies consist of drivers using their personal vehicles and provide valuable real-world data, but privacy issues must be handled very carefully. Drivers sign a consent form when they elect to participate, but passengers do not for a variety of practical reasons. However, their privacy must still be protected. One large study includes a blurred image of the entire cabin which allows reviewers to find passengers in the vehicle; this protects the privacy but still allows a means of answering questions regarding the impact of passengers on driver behavior. A method for automatically counting the passengers would have scientific value for transportation researchers. We investigated different image analysis methods for automatically locating and counting the non-drivers including simple face detection and fine-tuned methods for image classification and a published object detection method. We also compared the image classification using convolutional neural network and vision transformer backbones. Our studies show the image classification method appears to work the best in terms of absolute performance, although we note the closed nature of our dataset and nature of the imagery makes the application somewhat niche and object detection methods also have advantages. We perform some analysis to support our conclusion.