
We present a comparative study of pose-based vs. video-based Human Action Recognition (HAR) methods for driver monitoring in car cockpits. In this context, comparisons of neural network architectures from the field of deep learning-based video understanding are scarce. However, pose- and video-based HAR has significant potential for advanced driver-assistance systems in semi-autonomous driving on public roads. We compare prediction performance, per-class false-negative rate, model size, computational requirements, and inference latency on the established Drive&Act and the proprietary Driver Action Insight datasets. While the diversity and scale of available datasets make comparisons challenging, results suggest that both approaches benefit from pretraining, but pose- and video-based techniques perform differently for specific action classes, such as those that depend on body motion or the appearance of objects.

Trajectory prediction is crucial for autonomous systems, but traditional deep learning models, typically trained on specific pre-collected trajectories, often fail to generalize to unseen scenarios due to distribution shifts. Recent approaches address this by integrating online learning for adaptive deployment. However, existing online learning methods face two major challenges: (1) long training times, which prevent real-time execution, and (2) failure to account for variations in input data speed, leading to performance degradation when processing high-speed dynamic scenarios. To overcome these limitations, we introduce a latent-space predictor that forecasts future trajectories by aligning learned latent representations with encoded ground truth. This approach enhances robustness to distribution shifts while reducing reliance on direct coordinate regression. Additionally, we incorporate a lightweight online learning module, enabling efficient real-time adaptation without full model retraining. We evaluate our method on nuScenes, Waymo, and Lyft L5 datasets, focusing on data distribution shift scenarios. Experimental results demonstrate that our model outperforms state-of-the-art online learning methods, achieving approximate 9.9% improvement in trajectory prediction accuracy while significantly reducing optimization time up to 54%.

Autonomous vehicles currently rely on High-Definition (HD) maps for precise localization and path planning. However, traditional HD mapping approaches suffer from high costs, inherent rigidity, and slow update cycles, making them inadequate for dynamic urban environments. This paper presents a novel lightweight collaborative mapping architecture that enables real-time map updates through multi-agent cooperation. Our approach combines Joint Compatibility Branch and Bound (JCBB) for data association, Dempster-Shafer Theory (DST) for uncertainty quantification and landmark classification, and Extended Kalman Filter (EKF) for landmark pose estimation. Experimental validation using the CARLA simulator demonstrates accurate landmark classification and localization. Furthermore, collaborative data fusion reduces false positives and improves overall system reliability.

Accurate registration of subsea Light Detection and Ranging (LiDAR) point clouds is critical for offshore metrology, where millimeter-level errors can significantly impact operational cost and risk. This study evaluates automated registration methods for static scan positions acquired by Kraken Robotics systems. Two approaches were implemented in Open3D: a hierarchical tree-based Iterative Closest Point (ICP) method and a pose-graph multiway registration framework. The methods were tested on multiple real subsea datasets containing 19–33 high-density scans per subsea scene and on synthetic datasets generated with Digital Imaging and Remote Sensing Image Generation (DIRSIG) to enable ground-truth evaluation. Results show that multiway registration provides improved global consistency, lower adjacent-scan Root Mean Square Error RMSE, and reduced processing time compared to tree-based ICP. Ground-truth analysis demonstrated sub–sampling-level performance, corresponding to an expected 1–4 mm alignment accuracy for real datasets. Global coarse registration provided no measurable benefit for well-initialized static surveys. The final analysis demonstrates that multiway registration enables accurate, efficient, and fully automated subsea LiDAR alignment, reducing manual effort and improving metrology reliability.

Physically grounded PSF-based image degradation is necessary for studying optics-related object-detector robustness, but existing workflows typically rely on offline dataset generation and integrate poorly with GPU-resident frameworks such as MMDetection. We present CIDPL, a CUDA-accelerated Python library that adapts our standalone Image Degradation Application (IDA) into a real-time, framework-integrated pipeline for MMDetection. CIDPL couples Python and C++ via PyBind11, performs degradation directly on GPU tensors in the DataPreprocessor, and organizes multiple optical variants in a traceable Super-Batch format. Numerical validation shows exact agreement with IDA for TIFF inputs, while optical validation reproduces KrakenOS-based SFR trends. In throughput tests, CIDPL improves mean degradation speed over IDA by 4.7x on a single GPU and 7.6x on two GPUs, enabling real-time processing at 117 FPS with negligible overhead during both training and inference. KITTI experiments further show that the integration enables practical detector-level robustness studies under varying defocus conditions.

LED flicker is a persistent artifact in imaging, where lights modulated via Pulse Width Modulation (PWM) above 90 Hz appear steady to humans but produce temporal intensity variations in captured video. While hardware mitigations like split-pixel architectures reduce flicker, they introduce a fundamental trade-off with motion blur. Progress in learned LED flicker mitigation (LFM) is currently hindered by a lack of public ground-truth datasets. We address this gap with ISET-LFM, an open-source physics-based simulation framework that models LED flicker in driving scenes. Built on the ISET ecosystem, our pipelinecombines camera motion simulation with an analytical flicker model to generate realistic dual-exposure frame sequences alongsideflicker-free ground truth. We provide a synthetic datasetof scene radiance, enabling benchmarking and training of LFMalgorithms across diverse sensor and ISP architectures. Thecode and dataset are available at: https: // github. com/ AyushJam/ iset-lfm and https: // purl. stanford. edu/ wd776hn7919 respectively.

Raytracing in combination with Monte Carlo simulation is an accurate method to simulate optical systems in virtual 3D scenes. Since Monte Carlo simulation relies on random sampling, many samples per pixel need to be computed for a noise-free image, resulting in high computational effort. Even the fastest ray tracers can only trace a few samples per pixel in real-time. A common solution in computer graphics is to compute the image with a few samples per pixel and apply a Monte Carlo denoiser to remove the noise. Since the denoiser alters the image, the question arises to what extent this influences the quality of the simulation. Utilizing “Simulating tests to test simulation”, we measure the SFR curve of a simulation denoised with the NVIDIA OptiX Denoiser and compare it with a highly sampled baseline simulation. Although the image is altered by denoising, using denoised ray tracing simulations yields more realistic results for real-time rendering than a Gaussian blur, but there is a significant texture loss.

Automotive vision is a key component of advanced driver assistance systems (ADAS), enhancing road safety and improving vehicle operation for drivers. A critical requirement for automotive vision is achieving faster detections to ensure higher levels of safety. However, faster object detections using CMOS Image Sensors (CIS) are limited by their frame rate. While increasing the CIS frame rate enables faster object detection, it also results in higher sensor data rates and significantly increases power consumption. In our previous work, we demonstrated that utilizing event-based pixels—offering sparse spatial resolution but high temporal resolution—with low CIS framerate provides an effective alternative solution for faster object detections in automotive vision. Using hybrid sensor data (low CIS framerate + event-based sensor (EVS)) achieves comparable performance to high CIS framerate but with reduced data rates and power consumption. Specifically, in our previous study, we showed that using 7 fps CIS data combined with EVS data delivers the same performance as 20 fps CIS data, but with 40% lower data rate. In this work, we implement post-training quantization (PTQ) and quantization aware training (QAT) techniques to automotive vision models trained on hybrid sensor data (CIS+EVS). This enables automotive vision models using hybrid (CIS+EVS) sensors to reduce both sensor data rates and power consumption during inference, particularly when deployed on Neural Processing Units (NPUs).

The automotive industry has developed several high sensitivity CFAs, such as RCCB, RCCG and RYYCy, by substituting the primary color filters in the standard Bayer color filter pattern with lighter colors. This has had the side effect of lowering color accuracy, and more importantly, color separation of important traffic features such as traffic lights and lane markers. All high sensitivity automotive CFAs retain the red filter given the importance of red lights and signs. Counter-intuitively, this is a sub-optimal strategy, since the ideal red spectral response, being a difference of two Gaussians, has a large negative lobe that cannot be accurately approximated by a color filter. A more accurate method of capturing red is as a difference of yellow and green, which is analogous to the difference of L and M retinal cones that the human visual system uses. Remarkably, this results in both an improvement of sensitivity and color accuracy instead of trading off one for the other.

Road markings have been standardized for human perception for over a century. With the rapid expansion of autonomous vehicles that rely on machine vision, new challenges emerge: markings optimized for human drivers may fail automated perception systems under low lighting, adverse weather, or high retroreflectivity conditions. Drawing on imaging science, vehicle dynamics, and experimental results from road-paint sample analysis, we identify four critical design factors, including spatial characteristics, color, contrast, and retroreflectivity, and provide concrete recommendations for evolving road marking standards. These augment rather than replace existing human-centric requirements and are developed in alignment with the IEEE P2020 Automotive Image Quality Standards working group.