This paper presents a new pedestrian detection descriptor named Histogram of Oriented Phase and Gradient (HOPG) based on a combination of the Histogram of Oriented Phase (HOP) features and the Histogram of Oriented Gradient features (HOG). The proposed descriptor extracts the image information using both the gradient and phase congruency concepts. Although the HOG based method has been widely used in the human detection systems, it lacks to deal effectively with the images impacted by the illumination variations and cluttered background. By fusing HOP and HOG features, more structural information can be identified and localized in order to obtain more robust and less sensitive descriptors to lighting variations. The phase congruency information and the gradient of each pixel in the image are extracted with respect to its neighborhood. Histograms of the phase congruency and the gradients of the local segments in the image are computed with respect to its orientations. These histograms are concatenated to construct the HOPG descriptor. The performance evaluation of the proposed descriptor was performed using INRIA and DaimlerChrysler datasets. A linear support vector machine (SVM) classifier is used to train the pedestrians. The experimental results show that the human detection system based on the proposed features has less error rates and better detection performance over a set of state of the art feature extraction methodologies.
In this research, we present a novel Fuzzy Finite Automat (FFA) for predicting pedestrian’s intention for advanced driver assistant system. Because dangerous pedestrians generally have a higher moving velocity and lateral moving direction than the ‘standing’ pedestrian as well as tracking trajectory in the time domain, we estimate the state probability of pedestrian by considering spatial domain such as pedestrian’s face (looking back or not). To consider the above characteristics over temporal and spatial domain, ‘distance between a pedestrian and curb’, ‘distance between a pedestrian and vehicle’, and ‘head orientation and orientation variation’, and ‘speed of a pedestrian’ are used to generate probability density functions for the state transition value. In this paper, the four states connected with transitions of FFA are defined as Walking-SW, Standing, W-Crossing, and R-Crossing, and these states correspond to “walking sidewalk,” “standing sidewalk,” “walking crossing,” and “running crossing,” respectively. The state changes are controlled by various transition probabilities. There is no standard dataset for evaluating prediction performance using a stereo thermal camera, and we therefore created a KMU prediction dataset. The proposed algorithm was successfully applied to various pedestrian video sequences of the dataset, and showed an accurate prediction performance.
Several methods for 3D tracking use previous knowledge of scenario, which include workspace geometry, or active markers, that make feasible the tridimensional tracking of objects. However, in non-controllable scenarios there is a great challenge to guarantee a reliable and robust method. Parallel tracking method using PT-cameras are complicated because there are several conditions that affect motion detection (light, object displacement, PT-camera velocity, to mention a few). This work proposes a strategy for object tracking and estimating tridimensional position through camera-PT array. The camera array is used as a redundant way of focusing on reducing the error calculation. This method consists in simultaneously tracking the target object in all different cameras. Pan & Tilt are used as parameters of vectors in spherical coordinates. The tracking process is performed via active contours, which consists of a set of markers enclosing the target object and considering the contour as a high-energy zone. The tracking is then denoted as a Newton Rapson Optimization process which solves the problem of locating the maxima energy zone by superposing the latest reference position over the newest position in a given pair of images. Finally, our approach is tested in a controlled scenario. Luminance conditions are controlled and local references are used to match the estimated position and the real position.
The paper proposes a pose-based real-time system for inferring the engagement of a shopper with a retail shelf by recognizing some atomic actions of shelf-interaction. These actions include examining the shelf, reaching for an object, taking an object, reading a product’s label and placing it on a cart for check-out. A novel pose-representation that is robust to large intra-class variations while performing these retail actions, is proposed in this work. The paper also extends the framework to do real-time action segmentation, abnormal action detection and configurable privacy protection of shoppers. The abnormality detection also offers a scope for learning new un-modelled actions through crowdsourcing. Though the system currently relies on a Kinect sensor (RGBD) for computing the joints of the human body, the system can work with a combination of RGB surveillance camera and any 2D video-based pose-tracking algorithm. The system has an accuracy of 90% in recognizing the 5 actions considered in this work and exhibits a latency of about 1 sec w.r.t. real world action. This can have a huge potential in optimizing store resources and in improving the shopping experience of the customer.
This paper proposes a real-time vehicle tracking and type recognition system. An object tracker is recruited to detect vehicles within CCTV video footage. Subsequently, the vehicle region-of-interest within each frame are analysed using a set of features that consists of Region Features, Histogram of Oriented Gradient (HOG) and Local Binary Pattern (LBP) histogram features. Finally, a Support Vector Machine (SVM) is recruited as the classification tool to categorize vehicles into two classes: cars and vans. The proposed technique was tested on a dataset of 60 vehicles comprising of a mix of frontal/rear and angular views. Experimental results prove that the proposed technique offers a very high level of accuracy thereby promising applicability in real-life situations.
Detection and classification of vehicles is a paramount task in surveillance framework and for traffic management and control. The type of transportation infrastructure, road conditions, traffic trends and illumination conditions are some of the key factors that affect these essential tasks. This paper explores performance of existing techniques regarding detection and classification in local, day time, complex urban traffic videos with increased free flowing vehicle volume. Three different traffic datasets with varying level of complexity are used for analysis. The scene complexity is governed by factors such as vehicle speed, type and size of dynamic objects, direction of motion of vehicles, number of lanes, occlusion, length and camera viewing angle. The datasets include a big classification volume ranging to 1516 vehicles in NIPA (customized local dataset) and 1009 vehicles in TOLL PLAZA (customized local dataset) along-with a publicly available dataset with 51 vehicles namely, HIGHWAY II. Existing detection algorithms such as blob analysis, Kalman filter tracking and detection lines were applied for detection on all the three datasets and experimental results are presented. Results show that the algorithms perform well for low density, low speed, less shadow, better image resolution, appropriate camera viewing angle, better lighting conditions and occlusion free zones. However, as soon as the complexity of the scene is increased, several detection errors are identified. Further obtaining robust and invariant features of local vehicles design has been challenging during the process. A custom GUI is built to analyze results of the algorithm. This detection is further extended to classification of 231 vehicles of NIPA dataset which is a highly complex urban traffic scenario. Vehicles are classified as Small Vehicle (SV), Large Vehicle (LV) and Motorcycle (M) by using area threshold based classifier and dense Scale Invariant Feature Transform (SIFT) and Artificial Neural Network (ANN) classifier. Detailed comparison of both classifier results show that SIFT and ANN classifier performs better for classification tasks in highly complex urban scenarios and also points out that practical systems still require a robust classification scheme to get more than 80% accuracy.
Human body detection is an important research area with potential for numerous applications, including search and rescue missions, safe driving, surveillance, and security. Here, we propose and experimentally validate a novel concept named ‘arrayed laser image contrast evaluation’ (Alice) to detect the human body, based on the unique optical properties of human skin. In the Alice system, an NIR dot array laser is used for illumination, and the irradiated area is detected using a near infrared (NIR) camera. Human skin has the characteristic optical properties of relatively low light absorption and high light scattering in the NIR region. When human skin is illuminated with focused laser dots, the NIR light penetrates deeply. Light is scattered multiple times inside the skin before it is re-emitted. The light intensity distribution of the reflected light tends to be diffuse. Human skin can be easily identified using arrayed laser image contrasts, calculated from the reflected light intensity distribution. With the Alice system, an almost entirely hidden person can be successfully detected using information from even a tiny patch of skin. The Alice human body detection system thus has potential for use in a wide range of applications.
City traffic often exhibits regional characteristics, such as large trucks frequently appearing in the suburbs, and the paths to playgrounds on weekends generally being congested. Discovering and visualizing these hidden traffic regions inside which roads share similar characteristics of traffic conditions simplifies the modeling complexities of whole city traffic conditions and therefore contributes significantly toward city planning. Unfortunately, such traffic regions always have irregular shapes and are time varying, which makes their discovery extremely complicated. In addition, establishing a method to visualize and explore the traffic regions interactively still remains challenging. In this article, the authors propose a latent Dirichlet allocation (LDA)-based approach to the discovery of underlying traffic regions (or region topics) from vehicle trajectories captured by surveillance devices installed along roadsides. They treat vehicle trajectories as documents and the values of different traffic features, such as locations, directions, speeds and vehicle types, as the corresponding words. After applying the LDA model, they obtain a list of region topics with combined feature values, in which the different feature values are clustered with probabilistic assignments. Meanwhile, they build a prototype system to explore the surveillance-device-based vehicle trajectories according to the discovered region topics. The prototype system, which consists of map view, cloud view, treemap view and matrix-table view, visualizes the feature values of hidden traffic regions. The authors finally research a real case based on the traffic data in Wenzhou City, a large city in eastern China with a population of more than nine million. They investigate approximately 157 surveillance devices and 750,000 moving vehicles. The case demonstrates the effectiveness of both their proposed approach and the prototype system. © 2016 Society for Imaging Science and Technology. [DOI: 10.2352/J.ImagingSci.Technol.2016.60.2.020403]
We present a novel methodology for accurately registering a vector road map to wide area motion imagery (WAMI) gathered from an oblique perspective by exploiting the local motion associated with vehicular movements. Specifically, we identify and compensate for global motion from frame-to-frame in the WAMI which then allows ready detection of local motion that corresponds strongly with the locations of moving vehicles along the roads. Minimization of the chamfer distance between these identified locations and the network of road lines identified in the vector road map provides an accurate alignment between the vector road map and the WAMI image frame under consideration. The methodology provides a significant improvement over the approximate geo-tagging provided by on-board sensors and effectively side-steps the challenge of matching features between the completely different data modalities and viewpoints for the vector road map and the captured WAMI frames. Results over a test WAMI dataset indicate the effectiveness of the proposed methodology: both visual comparison and numerical metrics for the alignment accuracy are significantly better for the proposed method as compared with existing alternatives.