Practical video analytics systems that are deployed in bandwidth constrained environments like autonomous vehicles perform computer vision tasks such as face detection and recognition. In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC and then passed onto modules that perform face detection, alignment, and recognition sequentially. Previously, the modules of these systems have been evaluated independently using task-specific imbalanced datasets that can misconstrue performance estimates. In this paper, we perform a thorough end-to-end evaluation of a face analytics system using a driving-specific dataset, which enables meaningful interpretations. We demonstrate how independent task evaluations and dataset imbalances can overestimate system performance. We propose strategies to balance the evaluation dataset and to make its annotations consistent across multiple analytics tasks and scenarios. We then evaluate the end-to-end system performance sequentially to account for task interdependencies. Our experiments show that our approach provides a true estimate of the end-to-end performance for critical real-world systems.
Praneet Singh, Edward J. Delp, Amy R. Reibman, "End-to-end evaluation of practical video analytics systems for face detection and recognition" in Electronic Imaging, 2023, pp 111-1 - 111-6, https://doi.org/10.2352/EI.2023.35.16.AVM-111