IS&T | Library

Abstract

We welcome research papers and presentations that explore advances in deep learning and imaging in a hybrid edge-cloud processing environment. Research topics include face detection and recognition; human detection and tracking; human behavior and action recognition, biometrics; gesture analysis; event detection; anomaly detection; object tracking and recognition; 3D and depth information for object recognition and reconstruction; pose estimation; multimedia and multimodal inference and understanding; content analysis, indexing, search, and retrieval.

Digital Library: EI

Published Online: January 2022

VR facial expression tracking via action unit intensity regression model

171 33

expression
VR
blendshape

Xiaoyu Ji, Justin Yang, Jishang Wei, Yvonne Huang, Qian Lin, Jan P. Allebach, Fengqing Zhu

DOI

10.2352/EI.2022.34.8.IMAGE-255

Volume 34

Issue 8

Abstract

Virtual Reality (VR) Head-Mounted Displays (HMDs), also known as VR headsets, are powerful devices that provide interaction between people and the virtual 3D world generated by a computer. For an immersive VR experience, the realistic facial animation of the participant is crucial. However, facial expression tracking has been one of the major challenges of facial animation. Existing face tracking methods often rely on a statistical model of the entire face, which is not feasible as occlusions arising from HMDs are inevitable. In this paper, we provide an overview of the current state of VR facial expression tracking and discuss bottlenecks for VR expression re-targeting. We introduce a baseline method for expression tracking from single view, partially occluded facial infrared (IR) images, which are captured by the HP reverb G2 VR headset camera. The experiment shows good visual prediction results for mouth region expressions from a single person.

Digital Library: EI

Published Online: January 2022

Efficient real-time portrait video segmentation with temporal guidance

132 27

Portrait
Semantic segmentation
Temporal
Video
Real-time

Weichen Xu, Yezhi Shen, Qian Lin, Jan P. Allebach, Fengqing Zhu

DOI

10.2352/EI.2022.34.8.IMAGE-263

Volume 34

Issue 8

Abstract

Virtual background has become an increasingly important feature of online video conferencing due to the popularity of remote work in recent years. To enable virtual background, a segmentation mask of the participant needs to be extracted from the real-time video input. Most previous works have focused on image based methods for portrait segmentation. However, portrait video segmentation poses additional challenges due to complicated background, body motion, and inter-frame consistency. In this paper, we utilize temporal guidance to improve video segmentation, and propose several methods to address these challenges including prior mask, optical flow, and visual memory. We leverage an existing portrait segmentation model PortraitNet to incorporate our temporal guided methods. Experimental results show that our methods can achieve improved segmentation performance on portrait videos with minimum latency.

Digital Library: EI

Published Online: January 2022

Mix-loss trained bias-removed blind image denoising network

127 8

Image denoising network
Perceptual trained
Convolutional neural network

Yi Yang, Chih-Hsien Chou, Jan P. Allebach

DOI

10.2352/EI.2022.34.8.IMAGE-288

Volume 34

Issue 8

Abstract

We studied the modern deep convolutional neural networks used for image denoising, where RGB input images are transformed into RGB output images via feed-forward convolutional neural networks that use a loss defined in the RGB color space. Considering the difference between human visual perception and objective evaluation metrics such as PSNR or SSIM, we propose a data augmentation technique and demonstrate that it is equivalent to defining a perceptual loss function. We trained a network based on this and obtained visually pleasing denoised results. We also combine an unsupervised design and the bias-free network to deal with the overfitting due to the absence of clean images, and improve performance when the noise level exceeds the training range.

Digital Library: EI

Published Online: January 2022

Towards the creation of a nutrition and food group based image database

102 15

Food
Nutrition
Database

Zeman Shao, Jiangpeng He, Ya-Yuan Yu, Luotao Lin, Alexandra E. Cowan, Heather A. Eicher-Miller, Fengqing Zhu

DOI

10.2352/EI.2022.34.8.IMAGE-301

Volume 34

Issue 8

Abstract

Food classification is critical to the analysis of nutrients comprising foods reported in dietary assessment. Advances in mobile and wearable sensors, combined with new image based methods, particularly deep learning based approaches, have shown great promise to improve the accuracy of food classification to assess dietary intake. However, these approaches are data-hungry and their performances are heavily reliant on the quantity and quality of the available datasets for training the food classification model. Existing food image datasets are not suitable for fine-grained food classification and the following nutrition analysis as they lack fine-grained and transparently derived food group based identification which are often provided by trained dietitians with expert domain knowledge. In this paper, we propose a framework to create a nutrition and food group based image database that contains both visual and hierarchical food categorization information to enhance links to the nutrient profile of each food. We design a protocol for linking food group based food codes in the U.S. Department of Agriculture’s (USDA) Food and Nutrient Database for Dietary Studies (FNDDS) to a food image dataset, and implement a web-based annotation tool for efficient deployment of this protocol. Our proposed method is used to build a nutrition and food group based image database including 16,114 food images representing the 74 most frequently consumed What We Eat in America (WWEIA) food sub-categories in the United States with 1,865 USDA food code matched to a nutrient database, the USDA FNDDS nutrient database.

Digital Library: EI

Published Online: January 2022

Cultural assets identification using transfer learning

111 27

cultural assets identification
transfer learning
object classification
deep learning

Simon Bugert, Huajian Liu, Waldemar Berchtold, Martin Steinebach

DOI

10.2352/EI.2022.34.8.IMAGE-273

Volume 34

Issue 8

Abstract

Identifying cultural assets is a challenging task which requires specific expertise. In this paper, a deep learning based solution to identify archaeological objects is proposed. Several additions to the ResNet CNN architecture are introduced which consolidate features from different intermediate layers by applying global pooling operations. Unlike general object recognition, identifying archaeological objects poses new challenges. To meet the special requirements in classifying antiques, a hybrid network architecture is used to learn the characteristics of objects using transfer learning, which includes a classification network and a regression network. With the help of the regression network, the age of objects can be predicted, which improves the overall performance in comparison to manually classifying the age of objects. The proposed scheme is evaluated using a public database of cultural assets and the experimental results demonstrate its significant performance in identifying antique objects.

Digital Library: EI

Published Online: January 2022

Correspondences for image and video reconstruction

100 6

image processing
video reconstruction
computer vision
super-resolution
video frame interpolation

Xiaoyu Xiang, Yapeng Tian

DOI

10.2352/EI.2022.34.8.IMAGE-287

Volume 34

Issue 8

Abstract

Correspondences are prevalent in natural videos among different frames, as well as a set of images sharing a common attribute. Dense correspondences are important for the core problem of many natural image and video reconstruction tasks: recovering texture details with high fidelity. In this paper, we will discuss recent methods in learning and utilizing such correspondences in image and video reconstruction. Specifically, we decompose the network design into several switchable components of different purposes and discuss their applications to different images and video restoration tasks such as super-resolution, denoising, and video frame interpolation. In this way, we can analyze the performance and uncover the generic and efficient network design. Benefiting from the above investigations, our proposed methods achieve state-of-the-art performance on multiple tasks with fewer parameters. Our findings could inspire the network design of multiple image and video reconstruction tasks for the future.

Digital Library: EI

Published Online: January 2022

Automatic facial skin feature detection for everyone

276 45

Skin feature detection
Acne
Pigmentation
Wrinkle
Deep learning.

Qian Zheng, Ankur Purwar, Heng Zhao, Guang Liang Lim, Ling Li, Debasish Behera, Qian Wang, Min Tan, Rizhao Cai, Jennifer Werner, Dennis Sng, Maurice van Steensel, Weisi Lin, Alex C. Kot

DOI

10.2352/EI.2022.34.8.IMAGE-300

Volume 34

Issue 8