A Comparative Analysis of Video- and Pose-based Action Recognition for In-cabin Driver Monitoring

Lukas  Brunner; Dominik  Schörkhuber

doi:10.2352/EI.2026.38.16.AVM-100

Abstract

We present a comparative study of pose-based vs. video-based Human Action Recognition (HAR) methods for driver monitoring in car cockpits. In this context, comparisons of neural network architectures from the field of deep learning-based video understanding are scarce. However, pose- and video-based HAR has significant potential for advanced driver-assistance systems in semi-autonomous driving on public roads. We compare prediction performance, per-class false-negative rate, model size, computational requirements, and inference latency on the established Drive&Act and the proprietary Driver Action Insight datasets. While the diversity and scale of available datasets make comparisons challenging, results suggest that both approaches benefit from pretraining, but pose- and video-based techniques perform differently for specific action classes, such as those that depend on body motion or the appearance of objects.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2026.38.16.AVM-100

AVM-100

Proceedings Paper

A Comparative Analysis of Video- and Pose-based Action Recognition for In-cabin Driver Monitoring

BrunnerLukas

TU Wien, Austria

SchörkhuberDominik

TU Wien, Austria

Abstract

132026

AVM

Autonomous Vehicles and Machines 2026

100-1

100-7

2026

human action recognitiondriver monitoringADASautonomous vehiclescomputer vision

articleview.keywords