In this study, we propose a new multi-pedestrian tracking (MPT) method that performs quickly and efficiently track pedestrians in real-time system. The proposed method considers combining shallow convolutional neural networks (CNN) with ensemble learning method, Siamese random forests (SRF). Unlike conventional methods, to promote robustness of ensemble method, feature transformation is applied which exploit shallow networks in appearances of still images to extract enrich features. We formulate the problem of MOT in a structured learning framework based on SRF. Each forest learns differences of random feature pairs, which are extracted from the former process to enhance robustness to easily happened circumstances in a moving vehicle. When it compares to the conventional tracking algorithms, the proposed approach, based on SRF, takes advantage of lightweight and efficiency. The proposed lightweight multiple pedestrian tracker was successfully applied to benchmark datasets and yielded a similar or better performance level as compared with state-of-theart methods.
In this paper, we introduce a multi-pedestrian tracking algorithm for tracking from a moving vehicle. The method is based on online learning of a random ferns (RF) tracker model using the output features of a convolutional neural network (CNN). For real-time application in vehicles, an online method is applied within the tracking-by-detection framework where data association between detections and trackers is conducted online. To predict the tracker's position, we perform particle filtering with tracker models inferred from a shallow CNN. In this study, You Only Look Once (YOLO), a real-time object detection system, was adopted as the pre-trained model. Although YOLO has an accurate network for object classification, it is not appropriate for real-time multi-pedestrian tracking. Therefore, we use modified YOLO to obtain a shallow version (S-YOLO) having fewer convolutional layers and fewer filters in these layers. To update the tracker in every frame, positive and negative samples are applied to the S-YOLO and retraining is performed. Then, we extract feature descriptors from the first fully connected layer of S-YOLO to train the RF tracker models. The proposed algorithm was successfully applied to various pedestrian video sequences and yielded a more accurate tracking performance than other existing method.