The exam of fetal well-being during routine prenatal care plays a crucial role in preventing pregnancy complications and reducing the risks of miscarriages, birth defects and other health problems. However, the conventional prenatal screening and diagnosis is conducted by medical professionals in a clinical environment, which is subject to certain limitations such as manpower, medical devices and location, time and cost of services, etc. This paper presents a new approach to detect and monitor fetal movement safely and reliably without any constrains of time, environment and cost. Unlike the conventional method, our contribution includes a novel soft sensor pad which can automatically detect fetal movement and uterine contraction nonintrusively and the robust data analysis software to monitor pregnancy health and screen abnormalities with quantitative assessment. The monitoring belt embedded with the soft sensor pad is wearable, non-intrusive, radiation free and washable. The new algorithms are robust for noise removal, feature extraction, time sequence data analysis and decision support to achieve personalized care. Both the design of soft sensor pad and functions of the belt are original and unique. The results of preliminary clinical trials demonstrate the feasibility and advantages of our prototype.
In this paper we present a Cluster Aggregation Network (CAN) for face set recognition. This network takes a set of face images, which could be either face videos or clusters with a different number of face images as its input, and then it is able to produce a compact and fixed-dimensional feature representation for the face set for the purpose of recognition. The whole network is made up of two modules, among which the first one is a face feature embedding module and the second one is the face feature aggregation module. The first module is a deep Convolutional Neural Network (CNN) which maps each of the face images to a fixed-dimensional vector. The second module is also a CNN which is trained to be able to automatically assess the quality of input face images and thus assign various weights to the images’ corresponding feature vectors. Then the one aggregated feature vector representing the input set is formed inside the convex hull formed by the input single face image features. Due to the mechanism that quality assessment is invariant to the order of one image in a set and the number of images in the set, the aggregation is invariant to these factors. Our CAN is trained with standard classification loss without any other supervision information and we found that our network is automatically attracted to high quality face images, while repelling low quality images, such as blurred, blocked, and non-frontal face images. We trained our networks with CASIA and YouTube Face datasets and the experiments on IJB-C video face recognition benchmark show that our method outperforms the current state-of-the-art feature aggregation methods and our challenging baseline aggregation method.
Micro-expression (ME) analysis has been becoming an attractive topic recently. Nevertheless, the studies of ME mostly focus on the recognition task while spotting task is rarely touched. While micro-expression recognition methods have obtained the promising results by applying deep learning techniques, the performance of the ME spotting task still needs to be largely improved. Most of the approaches still rely upon traditional techniques such as distance measurement between handcrafted features of frames which are not robust enough in detecting ME locations correctly. In this paper, we propose a novel method for ME spotting based on a deep sequence model. Our framework consists of two main steps: 1) From each position of video, we extract a spatial-temporal feature that can discriminate MEs among extrinsic movements. 2) We propose to use a LSTM network that can utilize both local and global correlation of the extracted feature to predict the score of the ME apex frame. The experiments on two publicly databases of ME spotting demonstrate the effectiveness of our proposed method.
Emotion has an important role in daily life, as it helps people better communicate with and understand each other more efficiently. Facial expressions can be classified into 7 categories: angry, disgust, fear, happy, neutral, sad and surprise. How to detect and recognize these seven emotions has become a popular topic in the past decade. In this paper, we develop an emotion recognition system that can apply emotion recognition on both still images and real-time videos by using deep learning. We build our own emotion recognition classification and regression system from scratch, which includes dataset collection, data preprocessing, model training and testing. Given a certain image or a real-time video, our system is able to show the classification and regression results for all of the 7 emotions. The proposed system is tested on 2 different datasets, and achieved an accuracy of over 80%. Moreover, the result obtained from realtime testing proves the feasibility of implementing convolutional neural networks in real time to detect emotions accurately and efficiently.
We present a practical 3D-assited face alignment framework based on cascaded regression in this paper. The 3D information embedded in 2D face image is utilized to calculate two novel components to improve the performance of 2D methods in unconstrained face alignment. The two novel components for 2D image features are the projected local patch and the visibility of each landmark. First, we propose to extract the landmark related features in the projected local patches on 2D image from the corresponding 3D face model. Local patches of a fixed landmark in 3D face models for different 2D images cover the same region of face anatomically. The extracted features are more accurate for further locations regression of landmarks. Second, we propose to estimate the visibilities of 2D landmarks based on 3D face model, which are proven to be vital to address large pose face alignment problem. In this paper, we adopt Local Binary Features (LBF) to extract landmark related features in the proposed framework, and name the new method as 3D-Assisted LBF (3DALBF). An extensive evaluation on two face databases shows that 3DALBF can achieve better alignment results than the original 2D method and maintain the speed advantage of 2D method over 3D method.
This paper addresses the problem of face recognition using a graphical representation to identify structure that is common to pairs of images. Matching graphs are constructed where nodes correspond to local brightness gradient directions and edges are dependent on the relative orientation of the nodes. Similarity is determined from the size of maximal matching cliques in pattern pairs. The method uses a single reference face image to obtain recognition without a training stage. Results on samples from MegaFace obtain a 100% correct recognition result.
Considering the complexity of a multimedia society and the subjective task of describing images with words, a visual search application is a valuable tool. This work implements a Content-Based Image Retrieval (CBIR) application for texture images with the goal of comparing three deep convolutional neural networks (VGG-16, ResNet-50, and DenseNet-161), used as image descriptors by extracting global features from images. For measuring similarity among images and ranking them, we employed cosine similarity, Manhattan distance, Bray-Curtis dissimilarity, and Canberra distance. We confirm that global average pooling applied to convolutional layers provides good texture descriptors, and propose to use it when extracting features from VGGbased models. Our best result uses the average pooling layer from DenseNet-161 as a 2208-dim feature vector along with Bray-Curtis dissimilarity. We achieved 73:09% mAP@1 and 76:98% mAP@5 on the Describable Textures Dataset (DTD) benchmark, adapted for image retrieval. Our mAP@1 result is comparable to the state-of-the-art classification accuracy (73:8%). We also investigate the impact on retrieval performance when reducing the number of feature components with PCA. We are able to compress a 2208-dim descriptor down to 128 components with a moderate 3.3 percentage points drop in mAP@1.
In the competitive online fashion market place, it is common for sellers to add artificial elements to their product images, with the hope to improve the aesthetic quality of their products. Among the numerous types of artificial elements, we focus on detecting artificial frames in fashion images in this paper and we propose a novel algorithm based on traditional image processing techniques for this purpose. On the other hand, even though deep learning methods have been very powerful and effective in many image processing tasks in recent years, they do have their drawbacks in some cases, rendering them ineffective compared to our method for this particular task. Experimental results on 1000 testing images show that our algorithm has comparable performance with some of the state-of-the-art deep learning models that have been used for classification.
A barcode is the representation of data including some information related to goods, offered for sale which frequently appears on manufactured items. Especially in the online fashion market such as Poshmark (a second-hand fashion market), barcodes on the tags of the sale items represent the identified information including producer, manufacturer, etc. The market needs a system to automatically detect and decode barcodes in real time. However, the existing methods have some limitations for detecting 1-D barcodes in various backgrounds including tassels, stripes, and clustered text in fashion images. In this research, our focus is on identifying the barcodes in fashion images and distinguishing the barcode from similar non-barcode image content. It is accomplished by applying a Convolutional Neural Network (CNN) to solve this typical objective detection problem. A comparison of the performance between our algorithm and a previous method will be given in our results. Also, a traditional method based on hand-crafted features will be proposed for comparison. For the decoding part, a package including current common types of decoding schemes is used in our work to decode the detected barcodes. But it fails to decode strongly skewed barcode images. Adding pre-processing to warp the skewed images is used to increase the success of decoding.