Regular
blendshape
Convolutional neural networkcultural assets identification
Database
expression
Food
Image denoising networkInternet of Things edge imagingimage processing
machine learningmultimedia analysismobile imaging
Nutrition
PortraitPerceptual trained
Real-time
Skin feature detectionSemantic segmentation
Temporal
VRVideo
web imaging
 
Wrinkle Deep learning. video reconstruction deep learning computer vision Pigmentation transfer learning super-resolution video frame interpolation Acne object classification
 Filters
Month and year
 
  24  6
Image
Page ,  © Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

We welcome research papers and presentations that explore advances in deep learning and imaging in a hybrid edge-cloud processing environment. Research topics include face detection and recognition; human detection and tracking; human behavior and action recognition, biometrics; gesture analysis; event detection; anomaly detection; object tracking and recognition; 3D and depth information for object recognition and reconstruction; pose estimation; multimedia and multimodal inference and understanding; content analysis, indexing, search, and retrieval.

Digital Library: EI
Published Online: January  2022
  103  23
Image
Pages 255-1 - 255-7,  © 2022, Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

Virtual Reality (VR) Head-Mounted Displays (HMDs), also known as VR headsets, are powerful devices that provide interaction between people and the virtual 3D world generated by a computer. For an immersive VR experience, the realistic facial animation of the participant is crucial. However, facial expression tracking has been one of the major challenges of facial animation. Existing face tracking methods often rely on a statistical model of the entire face, which is not feasible as occlusions arising from HMDs are inevitable. In this paper, we provide an overview of the current state of VR facial expression tracking and discuss bottlenecks for VR expression re-targeting. We introduce a baseline method for expression tracking from single view, partially occluded facial infrared (IR) images, which are captured by the HP reverb G2 VR headset camera. The experiment shows good visual prediction results for mouth region expressions from a single person.

Digital Library: EI
Published Online: January  2022
  74  26
Image
Pages 263-1 - 263-7,  © 2022, Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

Virtual background has become an increasingly important feature of online video conferencing due to the popularity of remote work in recent years. To enable virtual background, a segmentation mask of the participant needs to be extracted from the real-time video input. Most previous works have focused on image based methods for portrait segmentation. However, portrait video segmentation poses additional challenges due to complicated background, body motion, and inter-frame consistency. In this paper, we utilize temporal guidance to improve video segmentation, and propose several methods to address these challenges including prior mask, optical flow, and visual memory. We leverage an existing portrait segmentation model PortraitNet to incorporate our temporal guided methods. Experimental results show that our methods can achieve improved segmentation performance on portrait videos with minimum latency.

Digital Library: EI
Published Online: January  2022
  60  7
Image
Pages 288-1 - 288-7,  © 2022, Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

We studied the modern deep convolutional neural networks used for image denoising, where RGB input images are transformed into RGB output images via feed-forward convolutional neural networks that use a loss defined in the RGB color space. Considering the difference between human visual perception and objective evaluation metrics such as PSNR or SSIM, we propose a data augmentation technique and demonstrate that it is equivalent to defining a perceptual loss function. We trained a network based on this and obtained visually pleasing denoised results. We also combine an unsupervised design and the bias-free network to deal with the overfitting due to the absence of clean images, and improve performance when the noise level exceeds the training range.

Digital Library: EI
Published Online: January  2022
  70  9
Image
Pages 301-1 - 301-6,  © 2022, Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

Food classification is critical to the analysis of nutrients comprising foods reported in dietary assessment. Advances in mobile and wearable sensors, combined with new image based methods, particularly deep learning based approaches, have shown great promise to improve the accuracy of food classification to assess dietary intake. However, these approaches are data-hungry and their performances are heavily reliant on the quantity and quality of the available datasets for training the food classification model. Existing food image datasets are not suitable for fine-grained food classification and the following nutrition analysis as they lack fine-grained and transparently derived food group based identification which are often provided by trained dietitians with expert domain knowledge. In this paper, we propose a framework to create a nutrition and food group based image database that contains both visual and hierarchical food categorization information to enhance links to the nutrient profile of each food. We design a protocol for linking food group based food codes in the U.S. Department of Agriculture’s (USDA) Food and Nutrient Database for Dietary Studies (FNDDS) to a food image dataset, and implement a web-based annotation tool for efficient deployment of this protocol. Our proposed method is used to build a nutrition and food group based image database including 16,114 food images representing the 74 most frequently consumed What We Eat in America (WWEIA) food sub-categories in the United States with 1,865 USDA food code matched to a nutrient database, the USDA FNDDS nutrient database.

Digital Library: EI
Published Online: January  2022
  85  26
Image
Pages 273-1 - 273-4,  © Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

Identifying cultural assets is a challenging task which requires specific expertise. In this paper, a deep learning based solution to identify archaeological objects is proposed. Several additions to the ResNet CNN architecture are introduced which consolidate features from different intermediate layers by applying global pooling operations. Unlike general object recognition, identifying archaeological objects poses new challenges. To meet the special requirements in classifying antiques, a hybrid network architecture is used to learn the characteristics of objects using transfer learning, which includes a classification network and a regression network. With the help of the regression network, the age of objects can be predicted, which improves the overall performance in comparison to manually classifying the age of objects. The proposed scheme is evaluated using a public database of cultural assets and the experimental results demonstrate its significant performance in identifying antique objects.

Digital Library: EI
Published Online: January  2022
  54  6
Image
Pages 287-1 - 287-10,  © Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

Correspondences are prevalent in natural videos among different frames, as well as a set of images sharing a common attribute. Dense correspondences are important for the core problem of many natural image and video reconstruction tasks: recovering texture details with high fidelity. In this paper, we will discuss recent methods in learning and utilizing such correspondences in image and video reconstruction. Specifically, we decompose the network design into several switchable components of different purposes and discuss their applications to different images and video restoration tasks such as super-resolution, denoising, and video frame interpolation. In this way, we can analyze the performance and uncover the generic and efficient network design. Benefiting from the above investigations, our proposed methods achieve state-of-the-art performance on multiple tasks with fewer parameters. Our findings could inspire the network design of multiple image and video reconstruction tasks for the future.

Digital Library: EI
Published Online: January  2022
  216  41
Image
Pages 300-1 - 300-6,  © Society for Imaging Science and Technology 2022
Volume 34
Issue 8
Abstract

Automatic assessment and understanding of facial skin condition have several applications, including the early detection of underlying health problems, lifestyle and dietary treatment, skin-care product recommendation, etc. S Selfies in the wild serve as an excellent data resource to democratize skin quality assessment, but suffer from several data collection challenges. The key to guaranteeing an accurate assessment is accurate detection of different skin features. We present an automatic facial skin feature detection method that works across a variety of skin tones and age groups for selfies in the wild. To be specific, we annotate the locations of acne, pigmentation, and wrinkle for selfie images with different skin tone colors, severity levels, and lighting conditions. The annotation is conducted in a two-phase scheme with the help of a dermatologist to train volunteers for annotation. We employ Unet++ as the network architecture for feature detection. This work shows that the two-phase annotation scheme can robustly detect the accurate locations of acne, pigmentation, and wrinkle for selfie images with different ethnicities, skin tone colors, severity levels, age groups, and lighting conditions.

Digital Library: EI
Published Online: January  2022

Keywords

[object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object]