Multi-modal pedestrian detection has been developed actively in the research field for the past few years. Multi-modal pedestrian detection with visible and thermal modalities outperforms visible-modal pedestrian detection by improving robustness to lighting effects and cluttered backgrounds because it can simultaneously use complementary information from visible and thermal frames. However, many existing multi-modal pedestrian detection algorithms assume that image pairs are perfectly aligned across those modalities. The existing methods often degrade the detection performance due to misalignment. This paper proposes a multi-modal pedestrian detection network for a one-stage detector enhanced by a dual-regressor and a new algorithm for learning multi-modal data, so-called object-based training. This study focuses on Single Shot MultiBox Detector (SSD), one of the most common one-stage detectors. Experiments demonstrate that the proposed method outperforms current state-of-the-art methods on artificial data with large misalignment and is comparable or superior to existing methods on existing aligned datasets.
Napat Wanchaitanawong, Masayuki Tanaka, Takashi Shibata, Masatoshi Okutomi, "Multi-Modal Pedestrian Detection Via Dual-Regressor and Object-Based Training for One-Stage Object Detection Network" in Electronic Imaging, 2024, pp 111-1 - 111-6, https://doi.org/10.2352/EI.2024.36.17.AVM-111