
We present a comparative study of pose-based vs. video-based Human Action Recognition (HAR) methods for driver monitoring in car cockpits. In this context, comparisons of neural network architectures from the field of deep learning-based video understanding are scarce. However, pose- and video-based HAR has significant potential for advanced driver-assistance systems in semi-autonomous driving on public roads. We compare prediction performance, per-class false-negative rate, model size, computational requirements, and inference latency on the established Drive&Act and the proprietary Driver Action Insight datasets. While the diversity and scale of available datasets make comparisons challenging, results suggest that both approaches benefit from pretraining, but pose- and video-based techniques perform differently for specific action classes, such as those that depend on body motion or the appearance of objects.
Lukas Brunner, Dominik Schörkhuber, "A Comparative Analysis of Video- and Pose-based Action Recognition for In-cabin Driver Monitoring" in Electronic Imaging, 2026, pp 100-1 - 100-7, https://doi.org/10.2352/EI.2026.38.16.AVM-100