IS&T | Library

Abstract

Image Processing: Algorithms and Systems continues the tradition of the past conference, Nonlinear Image Processing and Pattern Analysis, in exploring new image processing algorithms. Specifically, the conference aims at highlighting the importance of the interaction between transform-, model-, and learning-based approaches for creating effective algorithms and building modern imaging systems for new and emerging applications. It also reverberates the growing call for integration of the theoretical research on image processing algorithms with the more applied research on image processing systems.

Digital Library: EI

Published Online: January 2023

ORCA: An end-to-end video object removal framework with cropping interested region and quality assessment

181 68

video processing
end-to-end
inpainting
IQA
VQA

Minseong Son, Hansol Lee, Sungkeun Kwak, Jihwan Woo

DOI

10.2352/EI.2023.35.9.IPAS-285

Volume 35

Issue 9

Abstract

Recently, various types of Video Inpainting models have been released. Video Inpainting is used to naturally erase the object you want to erase in the video. However, to use inpainting models, we usually need frames extracted from a video and masks and most people make these data manually. We propose a novel End-to-End Video Object Removal framework with Cropping Interested Region and Video Quality Assessment (ORCA). ORCA is built in an end-to-end way by combining the Detection, Segmentation, and Inpainting modules. The characteristics of proposed framework are going through the cropping step before inpainting step. In addition, We propose our own video quality assessment since ORCA use two models for inpainting. Our new metric indicates the higher quality of the results between two models. Experimental results show the superior performance of the proposed methods.

Digital Library: EI

Published Online: January 2023

Detection of object throwing behavior in surveillance videos

216 70

Anomaly Detection
Throwing Behavior
CNN

Ivo P.C. Kersten, Erkut Akdag, Egor Bondarev, Peter H. de With

DOI

10.2352/EI.2023.35.9.IPAS-286

Volume 35

Issue 9

Abstract

Anomalous behavior detection is a challenging research area within computer vision. One such behavior is throwing action in traffic flow, which is one of the unique requirements of our Smart City project to enhance public safety. This paper proposes a solution for throwing action detection in surveillance videos using deep learning. At present, datasets for throwing actions are not publicly available. To address the use-case of our Smart City project, we first generate the novel public 'Throwing Action' dataset, consisting of 271 videos of throwing actions performed by traffic participants, such as pedestrians, bicyclists, and car drivers, and 130 normal videos without throwing actions. Second, we compare the performance of different feature extractors for our anomaly detection method on the UCF-Crime and Throwing-Action datasets. Finally, we improve the performance of the anomaly detection algorithm by applying the Adam optimizer instead of Adadelta, and we propose a mean normal loss function that yields better anomaly detection performance. The experimental results reach an area under the ROC curve of 86.10 for the Throwing-Action dataset, and 80.13 on the combined UCF-Crime+Throwing dataset, respectively.

Digital Library: EI

Published Online: January 2023

Hybrid diffractive optics (DOE & refractive lens) for broadband EDoF imaging

444 129

Diffractive imaging
Encoded phase mask
Hybrid diffractive optics
Fourier optics
Inverse imaging
Joint design of diffractive optics and image processing
Hardware-In-the Loop imaging

SeyyedReza MiriRostami, Samuel Pinilla, Igor Shevkunov, Vladimir Katkovnik, Karen Egiazarian

DOI

10.2352/EI.2023.35.9.IPAS-287

Volume 35

Issue 9

Abstract

In the considered hybrid diffractive imaging system, a refractive lens is arranged simultaneously with a multilevel phase mask (MPM) as a diffractive optical element (DOE) for Achromatic Extended-depth-of-field (EDoF) imaging. This paper proposes a fully differentiable image formation model that uses neural network techniques to maximize the imaging quality by optimizing MPM, digital image reconstruction algorithm, refractive lens parameters (aperture size, focal length), and distance between the MPM and sensor. Firstly, model-based numerical simulations and end-to-end joint optimization of imaging are used. A spatial light modulator (SLM) is employed in the second stage of the design to implement MPM optimized at the first stage, and the image processing is optimized experimentally using a learning-based approach. The third stage of optimization is targeted at joint optimization of the SLM phase pattern and image reconstruction algorithm in the hardware-in-the-loop (HIL) setup, which allows compensation for a mismatch between numerical modeling and the physical reality of optic and sensor. A comparative analysis of the imaging accuracy and quality using the optical parameters is presented. It is proved experimentally, first time to the best of our knowledge, that wavefront phase modulation can provide imaging of advanced quality as compared with some commercial multi-lens cameras.

Digital Library: EI

Published Online: January 2023

Evaluating active learning for blind imbalanced domains

197 37

machine learning
deep learning
active learning
data augmentation
out of distribution
blind imbalanced domains

Hiroshi Kuwajima, Masayuki Tanaka, Masatoshi Okutomi

DOI

10.2352/EI.2023.35.9.IPAS-288

Volume 35

Issue 9

Abstract

Deep learning, which has been very successful in recent years, requires a large amount of data. Active learning has been widely studied and used for decades to reduce annotation costs and now attracts lots of attention in deep learning. Many real-world deep learning applications use active learning to select the informative data to be annotated. In this paper, we first investigate laboratory settings for active learning. We show significant gaps between the results from different laboratory settings and describe our practical laboratory setting that reasonably reflects the active learning use cases in real-world applications. Then, we introduce a problem setting of blind imbalanced domains. Any data set includes multiple domains, e.g., individuals in handwritten character recognition with different social attributes. Major domains have many samples, and minor domains have few samples in the training set. However, we must accurately infer both major and minor domains in the test phase. We experimentally compare different methods of active learning for blind imbalanced domains in our practical laboratory setting. We show that a simple active learning method using softmax margin and a model training method using distance-based sampling with center loss, both working in the deep feature space, perform well.

Digital Library: EI

Published Online: January 2023

Deploying machine learning based segmentation for scientific imaging analysis at synchrotron facilities

281 90

Machine Learning
Image Segmentation
Image Labeling
Feature Extraction
Integrated Platform
User Facility
Deep Learning
Convolutional Neural Networks

Guanhua Hao, Eric J. Roberts, Tanny Chavez, Zhuowen Zhao, Elizabeth A. Holman, Howard Yanxon, Adam Green, Harinarayan Krishnan, Daniela Ushizima, Dylan McReynolds, Nicholas Schwarz, Petrus H. Zwart, Alexander Hexemer, Dilworth Parkinson

DOI

10.2352/EI.2023.35.9.IPAS-290

Volume 35

Issue 9

Abstract

Scientific user facilities present a unique set of challenges for image processing due to the large volume of data generated from experiments and simulations. Furthermore, developing and implementing algorithms for real-time processing and analysis while correcting for any artifacts or distortions in images remains a complex task, given the computational requirements of the processing algorithms. In a collaborative effort across multiple Department of Energy national laboratories, the "MLExchange" project is focused on addressing these challenges. MLExchange is a Machine Learning framework deploying interactive web interfaces to enhance and accelerate data analysis. The platform allows users to easily upload, visualize, label, and train networks. The resulting models can be deployed on real data while both results and models could be shared with the scientists. The MLExchange web-based application for image segmentation allows for training, testing, and evaluating multiple machine learning models on hand-labeled tomography data. This environment provides users with an intuitive interface for segmenting images using a variety of machine learning algorithms and deep-learning neural networks. Additionally, these tools have the potential to overcome limitations in traditional image segmentation techniques, particularly for complex and low-contrast images.

Digital Library: EI

Published Online: January 2023

Facial expression recognition using visual transformer with histogram of oriented gradients

105 30

facial expression recognition
visual transformer
histogram of oriented gradient

Jieun Kim, Ju o Kim, Seungwan Je, Deokwoo Lee

DOI

10.2352/EI.2023.35.9.IPAS-291

Volume 35

Issue 9

Abstract

Emotions play an important role in our life as a response to our interactions with others, decisions, and so on. Among various emotional signals, facial expression is one of the most powerful and natural means for humans to convey their emotions and intentions, and it has the advantage of easily obtaining information using only a camera, so facial expression-based emotional research is being actively conducted. Facial expression recognition(FER) have been studied by classifying them into seven basic emotions: anger, disgust, fear, happiness, sadness, surprise, and normal. Before the appearance of deep learning, handcrafted feature extractors and simple classifiers such as SVM, Adaboost was used to extracted Facial emotion. With the advent of deep learning, it is now possible to extract facial expression without using feature extractors. Despite its excellent performance in FER research, it is still challenging task due to external factors such as occlusion, illumination, and pose, and similarity problems between different facial expressions. In this paper, we propose a method of training through a ResNet [1] and Visual Transformer [2] called FViT and using Histogram of Oriented Gradients(HOGs) [3] data to solve the similarity problem between facial expressions.

Digital Library: EI

Published Online: January 2023

Face expression understanding by geometrical characterization of deep human face representation

84 23

Face expression analysis
Explainable Artificial Intelligence
Graph Representation Learning

Adrien Raison, Théo Biardeau, Pascal Bourdon, David Helbert

DOI

10.2352/EI.2023.35.9.IPAS-292

Volume 35

Issue 9

Abstract

Face expressions understanding is a key to have a better understanding of the human nature. In this contribution we propose an end-to-end pipeline that takes color images as inputs and produces a semantic graph that encodes numerically what are facial emotions. This approach leverages low-level geometric details as face representation which are numerical representations of facial muscle activation patterns to build this emotional understanding. It shows that our method recovers social expectations of what characterize facial emotions.

Digital Library: EI

Published Online: January 2023

Crowd counting using deep learning based head detection

360 111

Object detection
Convolutional Neural Networks
Deep Learning
YOLO
Yolov5
Precision
Mean average Precision

Maryam Hassan, Farhan Hussain, Sultan Daud Khan, Mohib Ullah, Mudassar Yamin, Habib Ullah

Pages 293--1 - 293-6, January 2023, This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. 2023

DOI

10.2352/EI.2023.35.9.IPAS-293

Volume 35

Issue 9

Abstract

Scale invariance and high miss detection rates for small objects are some of the challenging issues for object detection and often lead to inaccurate results. This research aims to provide an accurate detection model for crowd counting by focusing on human head detection from natural scenes acquired from publicly available datasets of Casablanca, Hollywood-Heads and Scut-head. In this study, we tuned a yolov5, a deep convolutional neural network (CNN) based object detection architecture, and then evaluated the model using mean average precision (mAP) score, precision, and recall. The transfer learning approach is used for fine-tuning the architecture. Training on one dataset and testing the model on another leads to inaccurate results due to different types of heads in different datasets. Another main contribution of our research is combining the three datasets into a single dataset, including every kind of head that is medium, large and small. From the experimental results, it can be seen that this yolov5 architecture showed significant improvements in small head detections in crowded scenes as compared to the other baseline approaches, such as the Faster R-CNN and VGG-16-based SSD MultiBox Detector.

Digital Library: EI

Published Online: January 2023

ILIAC: Efficient classification of degraded images using knowledge distillation with cutout data augmentation

318 56

Knowledge Distillation
Image Degradation
Convolutional Neural Networks
Image Classification
Data Augmentation
Efficient Neural Networks

Dinesh Daultani, Masayuki Tanaka, Masatoshi Okutomi, Kazuki Endo

DOI

10.2352/EI.2023.35.9.IPAS-296

Volume 35

Issue 9