Hand hygiene is essential for food safety and food handlers. Maintaining proper hand hygiene can improve food safety and promote public welfare. However, traditional methods of evaluating hygiene during food handling process, such as visual auditing by human experts, can be costly and inefficient compared to a computer vision system. Because of the varying conditions and locations of real-world food processing sites, computer vision systems for recognizing handwashing actions can be susceptible to changes in lighting and environments. Therefore, we design a robust and generalizable video system that is based on ResNet50 that includes a hand extraction method and a 2-stream network for classifying handwashing actions. More specifically, our hand extraction method eliminates the background and helps the classifier focus on hand regions under changing lighting conditions and environments. Our results demonstrate our system with the hand extraction method can improve action recognition accuracy and be more generalizable when evaluated on completely unseen data by achieving over 20% improvement on the overall classification accuracy.
Skeleton based action recognition is playing a critical role in computer vision research, its applications have been widely deployed in many areas. Currently, benefiting from the graph convolutional networks (GCN), the performance of this task is dramatically improved due to the powerful ability of GCN for modeling the Non-Euclidean data. However, most of these works are designed for the clean skeleton data while one unavoidable drawback is such data is usually noisy in reality, since most of such data is obtained using depth camera or even estimated from RGB camera, rather than recorded by the high quality but extremely costly Motion Capture (MoCap) [1] system. Under this circumstance, we propose a novel GCN framework with adversarial training to deal with the noisy skeleton data. With the guiding of the clean data in the semantic level, a reliable graph embedding can be extracted for noisy skeleton data. Besides, a discriminator is introduced such that the feature representation could further improved since it is learned with an adversarial learning fashion. We empirically demonstrate the proposed framework based on two current largest scale skeleton-based action recognition datasets. Comparison results show the superiority of our method when compared to the state-of-the-art methods under the noisy settings.
In cattle farm, it is important to monitor activity of cattle to know their health condition and prevent accidents. Sensors were used by conventional methods to recognize activity of cattle, but attachment of sensors to the animal may cause stress. Camera was used to recognize activity of cattle, but it is difficult to identify cattle because cattle have similar appearance, especially for black or brown cattle. We propose a new method to identify cattle and recognize their activity by surveillance camera. The cattle are recognized at first by CNN deep learning method. Face and body areas of cattle, sitting and standing state are recognized separately at same time. Image samples of day and night were collected for learning model to recognize cattle for 24-hours. Among the recognized cattle, initial ID numbers are set at first frame of the video to identify the animal. Then particle filter object tracking is used to track the cattle. Combing cattle recognition and tracking results, ID numbers of the cattle are kept to the following frames of the video. Cattle activity is recognized by using multi-frame of the video. In areas of face and body of cattle, active or static activities are recognized. Activity times for the areas are outputted as cattle activity recognition results. Cattle identification and activity recognition experiments were made in a cattle farm by wide angle surveillance cameras. Evaluation results demonstrate effectiveness of our proposed method.