Solid-state lidar cameras produce 3D images, useful in applications such as robotics and self-driving vehicles. However, range is limited by the lidar laser power and features such as perpendicular surfaces and dark objects pose difficulties. We propose the use of intensity images, inherent in lidar camera data from the total laser and ambient light collected in each pixel, to extract additional depth information and boost ranging performance. Using a pair of off-the-shelf lidar cameras and a conventional stereo depth algorithm to process the intensity images, we demonstrate increase of the native lidar maximum depth range by 2× in an indoor environment and almost 10× outdoors. Depth information is also extracted from features in the environment such as dark objects, floors and ceiling which are otherwise not detected by the lidar sensor. While the specific technique presented is useful in applications involving multiple lidar cameras, the principle of extracting depth data from lidar camera intensity images could also be extended to standalone lidar cameras using monocular depth techniques.
In recent years, several deep learning-based architectures have been proposed to compress Light Field (LF) images as pseudo video sequences. However, most of these techniques employ conventional compression-focused networks. In this paper, we introduce a version of a previously designed deep learning video compression network, adapted and optimized specifically for LF image compression. We enhance this network by incorporating an in-loop filtering block, along with additional adjustments and fine-tuning. By treating LF images as pseudo video sequences and deploying our adapted network, we manage to address challenges presented by the unique features of LF images, such as high resolution and large data sizes. Our method compresses these images competently, preserving their quality and unique characteristics. With the thorough fine-tuning and inclusion of the in-loop filtering network, our approach shows improved performance in terms of Peak Signal-to-Noise Ratio (PSNR) and Mean Structural Similarity Index Measure (MSSIM) when compared to other existing techniques. Our method provides a feasible path for LF image compression and may contribute to the emergence of new applications and advancements in this field.
The Modulation Transfer Function (MTF) is an important image quality metric typically used in the automotive domain. However, despite the fact that optical quality has an impact on the performance of computer vision in vehicle automation, for many public datasets, this metric is unknown. Additionally, wide field-of-view (FOV) cameras have become increasingly popular, particularly for low-speed vehicle automation applications. To investigate image quality in datasets, this paper proposes an adaptation of the Natural Scenes Spatial Frequency Response (NS-SFR) algorithm to suit cameras with a wide field-of-view.
Naturalistic driving studies consist of drivers using their personal vehicles and provide valuable real-world data, but privacy issues must be handled very carefully. Drivers sign a consent form when they elect to participate, but passengers do not for a variety of practical reasons. However, their privacy must still be protected. One large study includes a blurred image of the entire cabin which allows reviewers to find passengers in the vehicle; this protects the privacy but still allows a means of answering questions regarding the impact of passengers on driver behavior. A method for automatically counting the passengers would have scientific value for transportation researchers. We investigated different image analysis methods for automatically locating and counting the non-drivers including simple face detection and fine-tuned methods for image classification and a published object detection method. We also compared the image classification using convolutional neural network and vision transformer backbones. Our studies show the image classification method appears to work the best in terms of absolute performance, although we note the closed nature of our dataset and nature of the imagery makes the application somewhat niche and object detection methods also have advantages. We perform some analysis to support our conclusion.
Predicting the trajectory of an ego vehicle is a critical component of autonomous driving systems. Current state-of-the-art methods typically rely on Deep Neural Networks (DNNs) and sequential models to process front-view images for future trajectory prediction. However, these approaches often struggle with perspective issues affecting object features in the scene. To address this, we advocate for the use of Bird’s Eye View (BEV) perspectives, which offer unique advantages in capturing spatial relationships and object homogeneity. In our work, we leverage Graph Neural Networks (GNNs) and positional encoding to represent objects in a BEV, achieving competitive performance compared to traditional DNN-based methods. While the BEV-based approach loses some detailed information inherent to front-view images, we balance this by enriching the BEV data by representing it as a graph where relationships between the objects in a scene are captured effectively.
In digital cameras, the wavelength dependency of images captured through lenses and filters affects the spatial resolution characteristics of the images, which adversely impacts image quality. Previously, lens design and image processing techniques have been considered to address the aforementioned problem. Although aberrations could be improved, it was difficult to completely analyze the wavelength dependency of resolution characteristics. This study aims to reduce the wavelength dependency of the spatial resolution of digital cameras. Edge-based modulation transfer function (MTF) measurements using a 2D spectroradiometer were used to obtain wavelength-specific MTFs and quantitively reveal the wavelength dependency of the spatial resolution characteristics. Moreover, we experimentally confirmed that adjusting the MTFs of the CIEXYZ images obtained by combining spectral images to be closer reduces the difference in spatial resolution among color images while minimizing the color change before and after the adjustment.
Variability in the encoding of unique hues within the color vision system results in significant diversity in hue appearances among individuals. This paper introduces a novel algorithm that equalizes hue appearance for individual observers by incorporating their unique hue perception and replacing the default NCS unique hue in a modern color appearance model with individualized selections. By customizing the color representation to match the observer's perception, the algorithm enhances the consistency and personalized rendering of hue appearance. The findings of this study contribute to advancing color perception research and have practical implications in various domains such as, graphic design, and visual color reproduction.
Prognosis for melanoma patients is traditionally determined with a tumor depth measurement called Breslow thickness. However, Breslow thickness fails to account for cross-sectional area, which is more useful for prognosis. We propose to use segmentation methods to estimate cross-sectional area of invasive melanoma in whole-slide images. First, we design a custom segmentation model from a transformer pretrained on breast cancer images, and adapt it for melanoma segmentation. Secondly, we finetune a segmentation backbone pretrained on natural images. Our proposed models produce quantitatively superior results compared to previous approaches and qualitatively better results as verified through a dermatologist.
Acquisitions of mass-per-charge (m/z) spectrometry data from tissue samples, at high spatial resolutions, using Mass Spectrometry Imaging (MSI), require hours to days of time. The Deep Learning Approach for Dynamic Sampling (DLADS) and Supervised Learning Approach for Dynamic Sampling with Least-Squares (SLADS-LS) algorithms follow compressed sensing principles to minimize the number of physical measurements performed, generating low-error reconstructions from spatially sparse data. Measurement locations are actively determined during scanning, according to which are estimated, by a machine learning model, to provide the most relevant information to an intended reconstruction process. Preliminary results for DLADS and SLADS-LS simulations with Matrix-Assisted Laser Desorption/Ionization (MALDI) MSI match prior 70% throughput improvements, achieved in nanoscale Desorption Electro-Spray Ionization (nano-DESI) MSI. A new multimodal DLADS variant incorporates optical imaging for a 5% improvement to final reconstruction quality, with DLADS holding a 4% advantage over SLADS-LS regression performance. Further, a Forward Feature Selection (FFS) algorithm replaces expert-based determination of m/z channels targeted during scans, with negligible impact to location selection and reconstruction quality.
The dynamic range of the intensity of long-wave infrared (LWIR) cameras are often more than 8bit and its images have to be visualized using histogram equalization and so on. Many visualization methods do not consider effects of noise, which must be taken care of in real situations. We propose a novel LWIR images visualization method based on gradient-domain processing or gradient mapping. Processing based on intensity and gradient power in the gradient domain enables visualizing LWIR images with noise reduction. We evaluate the proposed method quantitatively and qualitatively and show its effectiveness.