Regular
augmentationAugmented RealityAction recognitionAgglomerative clusteringArtificial Intelligence
Bokeh effect
Cattle Keypoint DetectionCattle Re-IdentificationConditional Image Synthesischroma keyCattle ReIDCattle recognition
digit inferencedigital displaydigit extractionDeep face featuresDepth mapDeep Learning
Edge ComputingEidetic recognition
Facial pore segmentationFacial pore warpingFood Image ClassificationFew shot learningFood safetyFood Image Generation
green screenGenerative Adversarial Networks
Holstein cattleHand hygieneHead-Mounted Display
Image-based Dietary AssessmentInternet of Things edge imaging
Keypoint Alignment
legacy iot device interface
meter readingmobile imagingMedical ImagingMovie character identificationmachine learningmultimedia analysis
Object detection data
Person Re-identificationPrivacy
Remote SensingReal-timeRecurrent Neural NetworkReal-world computer visionReal-Time Image Processing
small dataSkincare product efficacySkin healthSelf-supervised Learningspecies classificationShort-term facial pore simulationseven segment displaySegmentation Refiner
ThermographyTransfer learning
Video Portrait SegmentationVideo Super-ResolutionVisual Representation Learning
Wide Activationweb imagingWearable
yolo
 Filters
Month and year
 
  32  21
Image
Pages A07-1 - A07-9,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Recent progress at the intersection of deep learning and imaging has created a new wave of interest in imaging and multimedia analytics topics, from social media sharing to augmented reality, from food and nutrition to health surveillance, from remote sensing and agriculture to wildlife and environment monitoring. Compared to many subjects in traditional imaging, these topics are more multi-disciplinary in nature. This conference will provide a forum for researchers and engineers from various related areas, both academic and industrial, to exchange ideas and share research results in this rapidly evolving field.

Digital Library: EI
Published Online: January  2023
  84  43
Image
Pages 268-1 - 268-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability. In this work, we explore the StyleGAN and its application of synthetic food image generation. Despite the impressive performance of GAN for natural image generation, food images suffer from high intra-class diversity and inter-class similarity, resulting in overfitting and visual artifacts for synthetic images. Therefore, we aim to explore the capability and improve the performance of GAN methods for food image generation. Specifically, we first choose StyleGAN3 as the baseline method to generate synthetic food images and analyze the performance. Then, we identify two issues that can cause performance degradation on food images during the training phase: (1) inter-class feature entanglement during multi-food classes training and (2) loss of high-resolution detail during image downsampling. To address both issues, we propose to train one food category at a time to avoid feature entanglement and leverage image patches cropped from high-resolution datasets to retain fine details. We evaluate our method on the Food-101 dataset and show improved quality of generated synthetic food images compared with the baseline. Finally, we demonstrate the great potential of improving the performance of downstream tasks, such as food image classification by including high-quality synthetic training samples in the data augmentation.

Digital Library: EI
Published Online: January  2023
  62  25
Image
Pages 269-1 - 269-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Food image classification is the groundwork for image-based dietary assessment, which is the process of monitoring what kinds of food and how much energy is consumed using captured food or eating scene images. Existing deep learning based methods learn the visual representation for food classification based on human annotation of each food image. However, most food images captured from real life are obtained without labels, requiring human annotation to train deep learning based methods. This approach is not feasible for real world deployment due to high costs. To make use of the vast amount of unlabeled images, many existing works focus on unsupervised or self-supervised learning to learn the visual representation directly from unlabeled data. However, none of these existing works focuses on food images, which is more challenging than general objects due to its high inter-class similarity and intra-class variance. In this paper, we focus on two items: the comparison of existing models and the development of an effective self-supervised learning model for food image classification. Specifically, we first compare the performance of existing state-of-the-art self-supervised learning models, including SimSiam, SimCLR, SwAV, BYOL, MoCo, and Rotation Pretext Task on food images. The experiments are conducted on the Food-101 dataset, which contains 101 different classes of foods with 1,000 images in each class. Next, we analyze the unique features of each model and compare their performance on food images to identify the key factors in each model that can help improve the accuracy. Finally, we propose a new model for unsupervised visual representation learning on food images for the classification task.

Digital Library: EI
Published Online: January  2023
  39  9
Image
Pages 271-1 - 271-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

In this paper, we present a method for agglomerative clustering of characters in a video. Given a video edited with humans, we seek to identify each person with the character they represent. The proposed method is based on agglomerative clustering of deep face features, using first neighbour relations. First, the heads and faces of each person are detected and tracked in each shot of the video. Then, we create a feature vector of a tracked person in a shot. Finally, we compare the feature vectors and we use first neighbour relations to group them into distinct characters. The main contribution of this work is a person re-identification framework based on an agglomerative clustering method, and applied to edited videos with large scene variations.

Digital Library: EI
Published Online: January  2023
  74  40
Image
Pages 272-1 - 272-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Real-time video super-resolution (VSR) has been considered a promising solution to improving video quality for video conferencing and media video playing, which requires low latency and short inference time. Although state-of-the-art VSR methods have been proposed with well-designed architectures, many of them are not feasible to be transformed into a real-time VSR model because of vast computation complexity and memory occupation. In this work, we propose a light-weight recurrent network for this task, where motion compensation offset is estimated by an optical flow estimation network, features extracted from the previous high-resolution output are aligned to the current target frame, and a hidden space is utilized to propagate long-term information. We show that the proposed method is efficient in real-time video super-resolution. We also carefully study the effectiveness of the existence of an optical flow estimation module in a lightweight recurrent VSR model and compare two ways of training the models. We further compare four different motion estimation networks that have been used in light-weight VSR approaches and demonstrate the importance of reducing information loss in motion estimation.

Digital Library: EI
Published Online: January  2023
  80  31
Image
Pages 273-1 - 273-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Video conferencing usage dramatically increased during the pandemic and is expected to remain high in hybrid work. One of the key aspects of video experience is background blur or background replacement, which relies on good quality portrait segmentation in each frame. Software and hardware manufacturers have worked together to utilize depth sensor to improve the process. Existing solutions have incorporated depth map into post processing to generate a more natural blurring effect. In this paper, we propose to collect background features with the help of depth map to improve the segmentation result from the RGB image. Our results show significant improvements over methods using RGB based networks and runs faster than model-based background feature collection models.

Digital Library: EI
Published Online: January  2023
  122  41
Image
Pages 275-1 - 275-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Hand hygiene is essential for food safety and food handlers. Maintaining proper hand hygiene can improve food safety and promote public welfare. However, traditional methods of evaluating hygiene during food handling process, such as visual auditing by human experts, can be costly and inefficient compared to a computer vision system. Because of the varying conditions and locations of real-world food processing sites, computer vision systems for recognizing handwashing actions can be susceptible to changes in lighting and environments. Therefore, we design a robust and generalizable video system that is based on ResNet50 that includes a hand extraction method and a 2-stream network for classifying handwashing actions. More specifically, our hand extraction method eliminates the background and helps the classifier focus on hand regions under changing lighting conditions and environments. Our results demonstrate our system with the hand extraction method can improve action recognition accuracy and be more generalizable when evaluated on completely unseen data by achieving over 20% improvement on the overall classification accuracy.

Digital Library: EI
Published Online: January  2023
  294  72
Image
Pages 276-1 - 276-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Simulating the effects of skincare products on the face is a potential new mode for product self-promotion while facilitating consumers to choose the right product. Furthermore, such simulations enable one to anticipate her skin condition and better manage skin health. However, there is a lack of effective simulations today. In this paper, we propose the first simulation model to reveal facial pore changes after using skincare products. Our simulation pipeline consists of two steps: training data establishment and facial pore simulation. To establish training data, we collect face images with various pore quality indexes from short-term (8-weeks) clinical studies. People experience significant skin fluctuations (due to natural rhythms, external stressors, etc.,) which introduce large perturbations, and we propose a sliding window mechanism to clean data and select representative index(es) to present facial pore changes. The facial pore simulation stage consists of 3 modules: UNet-based segmentation module to localize facial pores; regression module to predict time-dependent warping hyperparameters; and deformation module, taking warping hyperparameters and pore segmentation labels as inputs, to precisely deform pores accordingly. The proposed simulation renders realistic facial pore changes. This work will pave the way for future research in facial skin simulation and skincare product developments.

Digital Library: EI
Published Online: January  2023
  45  16
Image
Pages 278-1 - 278-5,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

We present a head-mounted holographic display system for thermographic image overlay, biometric sensing, and wireless telemetry. The system is lightweight and reconfigurable for multiple field applications, including object contour detection and enhancement, breathing rate detection, and telemetry over a mobile phone for peer-to-peer communication and incident commanding dashboard. Due to the constraints of the limited computing power of an embedded system, we developed a lightweight image processing algorithm for edge detection and breath rate detection, as well as an image compression codec. The system can be integrated into a helmet or personal protection equipment such as a face shield or goggles. It can be applied to firefighting, medical emergency response, and other first-response operations. Finally, we present a case study of "Cold Trailing" for forest fire prevention in the wild.

Digital Library: EI
Published Online: January  2023
  76  37
Image
Pages 279--1 - 279-6,  © 2023, Society for Imaging Science and Technology 2023
Volume 35
Issue 7
Abstract

Ability to identify individual cows quickly and readily in the barn would enable real time monitoring of their behavior, health, eating habits and more, all of which could save time, money, or effort. This work focuses on creating an eidetic recognition or re-identification (ReID) algorithm that learns to recognize individual cows with just a single training example per cow and with near zero time to learn to identify a new cow, both features which the existing cattle ReID systems lack. Our algorithm is designed to improve recognition robustness to deformations in cow bodies that occur when they are walking, turning, or are seen slightly off-angle. Individual cows are first detected and localized using popular keypoint and mask detection techniques, then aligned to a fixed template, pixelated, binarized to reduce lighting effects, and serialized to obtain bit-vectors. Bit-vectors from cows at inference time are matched to those from training time using Hamming distance. To improve results, we add modules to verify the validity of detected keypoints, interpolate missing keypoints, and combine predictions from multiple frames using a majority vote. The video level accuracy is over 60% for a set of nearly 150 Holstein cows.

Digital Library: EI
Published Online: January  2023

Keywords

[object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object]