Regular
behavior analysisBlockchainBed Exit
Convolutional neural networkCADComputer visionConvolution Neural Network
Drowsiness detectionDeep Learning
Eye detectionEnd-to-end SystemEgocentric Image
Face trackingFingerprintingFood and Computer VisionFood DetectionFacial landmark
Human Detection
imagingImage-based dietary assessmentIndustrial defect detection
machine learningmobilemultimedia analysismulti-object trackingMobile
object detectionObject DetectionOculus HeadsetOptical flow
Patient monitoring
rPPG predictionRespiration rateremote heart rate measurementRecognition
video thumbnailvideo analysisVisual documents trackingvideo analyticsVirtual Realityvideo representationvideo summarization
web
3D Modeling3D-CNN
 Filters
Month and year
 
  23  1
Image
Pages A08-1 - A08-6,  © Society for Imaging Science and Technology 2021
Digital Library: EI
Published Online: January  2021
  41  3
Image
Pages 232-1 - 232-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

In this paper, we propose a video analytics system to identify the behavior of turkeys. Turkey behavior provides evidence to assess turkey welfare, which can be negatively impacted by uncomfortable ambient temperature and various diseases. In particular, healthy and sick turkeys behave differently in terms of the duration and frequency of activities such as eating, drinking, preening, and aggressive interactions. Our system incorporates recent advances in object detection and tracking to automate the process of identifying and analyzing turkey behavior captured by commercial grade cameras. We combine deep-learning and traditional image processing methods to address challenges in this practical agricultural problem. Our system also includes a web-based user interface to create visualization of automated analysis results. Together, we provide an improved tool for turkey researchers to assess turkey welfare without the time-consuming and labor-intensive manual inspection.

Digital Library: EI
Published Online: January  2021
  55  10
Image
Pages 233-1 - 233-8,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

Drowsiness driving is one of the major reasons causing deadly traffic accidents in the United States of America. This paper intends to propose a system to detect different levels of drowsiness, which can help drivers to have enough time to handle sleepiness. Furthermore, we use distinct sound alarms to warn the user to prevent early accidents. The basis of the proposed approach is to consider symptoms of drowsiness, including the amount of eye closure, yawning, eye blinking, and head position to classify the level of drowsiness. We design a method to extract eye and mouth features from 68 key points of facial landmark. These features will help the system to detect the level of drowsiness in realtime video stream based on different symptoms. The experiential results show that the average accuracy of the system that has the capability to detect drowsiness intensity scale in different light conditions is approximately 96.6%.

Digital Library: EI
Published Online: January  2021
  118  38
Image
Pages 267-1 - 267-11,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

In this paper, we propose a novel system for remotely estimating the respiration rate of people. Periodic inhalation and exhalation during respiration cycles induce subtle upper body movements, which are reflected by the local image deformation over time when recorded by a digital camera. This local image deformation can be recovered by estimating the optical flow between consecutive frames. We propose the usage of convolutional neural networks designed for general image registration to estimate the induced optical flow, the periodicity of which is then leveraged to obtain the respiration rate by frequency analysis. The proposed system is robust to lighting condition, camera type (RGB, infrared), clothing, and posture (sitting in chair/lying in bed); and it could be used by individuals with a webcam, or by healthcare centers to monitor the patients at night.

Digital Library: EI
Published Online: January  2021
  35  4
Image
Pages 268-1 - 268-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

Heart rate, the speed of the heartbeat, has been regarded as one of the most important measurements to evaluate one’s health. It can be used to measure one’s anxiety, stress and illness; abnormalities of heart rate usually indicate potential disease one may have. Recent studies have shown that it is possible to directly measure the heart rate from a sequence of images that contain a person’s face. Requiring only a webcam, this method largely simplifies the process of traditional methods, which require the use of a pulse oximeter attached to the fingertip to measure the PPG signal, or electrodes placed on the skin to measure the ECG signal. However, this most recent method, though attracting a lot of interest, still suffers from sudden movement of the head, or turning away from the camera. In this paper, we propose a novel robust method of generating reliable PPG signals and measuring the heart rate from only face videos in real time, which is invariant to the movement of the head. We have also conducted studies on how different factors, light conditions, the angle of the head and the distance of the head away from the camera, could affect the predictions of the heart rate. After conducting a thorough analysis, we can conclude that our method succeeds in producing accurate, robust and promising results.

Digital Library: EI
Published Online: January  2021
  54  4
Image
Pages 269-1 - 269-8,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

Among hospitalized patients, getting up from bed can lead to fall injuries, 20% of which are severe cases such as broken bones or head injuries. To monitor patients’ bed-side status, we propose a deep neural network model, Bed Exit Detection Network (BED Net), for bed-exit behavior recognition. The BED Net consists of two sub-networks: a Posture Detection Network (Pose Net), and an Action Recognition Network (AR Net). The Pose Net leverages state-of-the-art neural-network-based keypoint detection algorithms to detect human postures from color camera images. The output sequences from Pose Net are passed to the AR Net for bed-exit behavior recognition. By formatting a pre-trained model as an intermediary, we train the proposed network using a newly collected small dataset, HP-BED-Dataset. We will show the results of our proposed BED Net.

Digital Library: EI
Published Online: January  2021
  47  7
Image
Pages 279-1 - 279-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

As during the last decade the limit between professional and personal usage of smartphones gradually disappeared, the present study is devoted to the tracking of the visual documents scanned by a personal mobile phone for some professional reasons. By a visual document we assume a composition of text, graphics and images corresponding to various physical-world documents (invoices, calls for tenders, legal contracts, etc.). As the scanning (capturing) conditions cannot be reproductible, the main issue is to unambiguously and securely identify various digital representations for a same physical document. A second issue is related to the inherent constraints in resources made available for such a task in the mobile/embedded environment. To jointly solve these issues, a solution based on coupling the blockchain technologies to the visual fingerprinting principles is advanced. The novel elements thus brought to light relate to (1) the coupling of fingerprint and blockchain solutions, (2) the unitary smart contracts generation and management (with illustrations for the Tezos blockchain) and (3) an on-chain / off-chain work balancing solution for coping to the mobile world constraints. The experimental results obtained on a database of more than 10 000 visual documents resulted in F1 score equal up to 0.98 while being compatible with low-resources computing environments (Raspberry Pi).

Digital Library: EI
Published Online: January  2021
  104  31
Image
Pages 280-1 - 280-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

The immersive Virtual Reality (VR) environment transforms the way people learn, work, and communicate. While the traditional modeling tools such as Blender and AutoCAD are commonly used for industrial design today, handling 3D objects on a two-dimensional screen with a keyboard and mouse is very challenging. In this work, we introduce a VR modeling system named The Virtual Workshop supporting the design and manipulation of various 3D objects in virtual environments. The proposed system was developed for the Oculus Rift platform allowing a user to efficiently and precisely design 3D objects using two hands directly. The friendly system GUI supports the user to create new 3D objects from scratch using premade “basic objects” or alternatively to import an existing 3D object made in other applications. Meanwhile, the finished 3D models are stored in the standard OBJ file format and exported later for the development of 3D scenarios in other applications such as the Unity engine. From concept to design, the VR modeling system provides an open platform where the designers and the clients can better share their ideas as well as interactively refine the rendered virtual models in collaboration.

Digital Library: EI
Published Online: January  2021
  50  10
Image
Pages 281-1 - 281-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

In this work, we propose a method that detects and segments manufacturing defects in objects using only RGB images. The method can be divided into three different integrated modules: object detection, pose estimation and defect segmentation. The first two modules are deep learning-based approaches and were trained exclusively with synthetic data generated with a 3D rendering engine. The first module, object detector, is based on the Mask R-CNN method and provides the classification and segmentation of the object of interest as the output. The second module, pose estimator, uses the category of the object and the coordinates of the detection as input to estimate the pose with 6 degrees-of-freedom with an autoencoder-based approach. Thereafter it is possible to render the reference 3D CAD model with the estimated pose over the detected object and compare the real object with its virtual model. The third and last step uses only image processing techniques, such as morphology operations and dense alignment, to compare the segmentation of the detected object from the first step, and the mask of the rendered object of the second step. The output is an image with the shape defects highlighted. We evaluate our method on a custom test set with the intersection over union metric, and our results indicate the method is robust to small imprecision from each module.

Digital Library: EI
Published Online: January  2021
  40  13
Image
Pages 283-1 - 283-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 8

With the availability of fast internet and convenient imaging devices such as smart phones, videos are becoming increasingly popular and important content on social media platforms recently. They are widely adopted for various purposes including, but not limited to, advertisement, education and entertainment. One important problem in understanding videos is thumbnail generation, which involves selecting one or a few images, typically frames, which are representative of the given video. These thumbnails can then be used not only as a summary display for videos, but also for representing them in downstream content models. Thus, thumbnail selection plays an important role in a user’s experience when exploring and consuming videos. Due to the large scale of data, automatic thumbnail generation methods are desired since it is impossible to manually select thumbnails for all videos. In this paper, we propose a practical thumbnail generation method. Our method is designed in a way that will select representative and high-quality frames as thumbnails. Specifically, to capture semantic information of video frames, we leverage the embeddings of video frames generated by a state of the art con-volutional neural network pretrained in a supervised manner on external image data, using them to find representative frames in a semantic space. To efficiently evaluate the quality of each frame, we train a linear model on top of the embeddings to predict quality instead of computing it from raw pixels. We conduct experiments on real videos and show the proposed algorithm is able to generate relevant and engaging thumbnails.

Digital Library: EI
Published Online: January  2021

Keywords

[object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object]