IS&T | Library

Abstract

This conference brings together real-world practitioners and researchers in intelligent robots and computer vision to share recent applications and developments. Topics of interest include the integration of imaging sensors supporting hardware, computers, and algorithms for intelligent robots, manufacturing inspection, characterization, and/or control. The decreased cost of computational power and vision sensors has motivated the rapid proliferation of machine vision technology in a variety of industries, including aluminum, automotive, forest products, textiles, glass, steel, metal casting, aircraft, chemicals, food, fishing, agriculture, archaeological products, medical products, artistic products, etc. Other industries, such as semiconductor and electronics manufacturing, have been employing machine vision technology for several decades. Machine vision supporting handling robots is another main topic. With respect to intelligent robotics another approach is sensor fusion – combining multi-modal sensors in audio, location, image and video data for signal processing, machine learning and computer vision, and additionally other 3D capturing devices. There is a need for accurate, fast, and robust detection of objects and their position in space. Their surface, background, and illumination are uncontrolled, and in most cases the objects of interest are within a bulk of many others. For both new and existing industrial users of machine vision, there are numerous innovative methods to improve productivity, quality, and compliance with product standards. There are several broad problem areas that have received significant attention in recent years. For example, some industries are collecting enormous amounts of image data from product monitoring systems. New and efficient methods are required to extract insight and to perform process diagnostics based on this historical record. Regarding the physical scale of the measurements, microscopy techniques are nearing resolution limits in fields such as semiconductors, biology, and other nano-scale technologies. Techniques such as resolution enhancement, model-based methods, and statistical imaging may provide the means to extend these systems beyond current capabilities. Furthermore, obtaining real-time and robust measurements in-line or at-line in harsh industrial environments is a challenge for machine vision researchers, especially when the manufacturer cannot make significant changes to their facility or process.

Digital Library: EI

Published Online: January 2022

Deep learning based wheat ears count in robot images for wheat phenotyping

162 43

Wheat Spikes
Deep Learning
Faster RCNN
EfficientDet.

Ehsan Ullah, Mohib Ullah, Muhammad Sajjad, Faouzi Alaya Cheikh

Pages 264-1 - 264-6, January 2022, © Society for Imaging Science and Technology 2022

DOI

10.2352/EI.2022.34.6.IRIACV-264

Volume 34

Issue 6

Abstract

The number of spikes, spikelets per spike, number of spikes per square meter are essential metrics for plant breeders and researchers in predicting wheat crop yield. Evaluating the crop yield based on wheat ears counting is still done manually, which is a labor-intensive, tedious and costly task. Thus, there is a significant need to develop a real-time wheat spikes/ears counting system for plant breeders for effective and efficient crop yield predictions. This paper proposed two deep learning-based methods based on EfficientDet and Faster-RCNN to detect and count the spikes. The images are taken using high-throughput phenotyping techniques under natural field conditions, and the algorithms localize and automatically count wheat spikes/ears. Faster R-CNN with Resnet50 as backbone architecture produced an overall accuracy of 88.7% on the test images. We also used recent stateof- the-art models EfficientDet-D5 and EfficientDet-D7, having backbone architectures EfficientNet-B5 and EfficientNet- B7, respectively. A comprehensive quantitative analysis is performed on the standard performance metrics. In the analysis, the EfficientDet-D5 model produces an accuracy of 92.7% on the test images, and EfficientDet-D7 produces an accuracy of 93.6%.

Digital Library: EI

Published Online: January 2022

Incremental two-network approach to develop a purity analyzer system for canola seeds

143 22

Artificial intelligence
neural network
object recognition
content detection
canola seeds
image analysis.

Kuldeep Singh, Fernando Saccon, Dileepan Joseph

Pages 265-1 - 265-7, January 2022, © Society for Imaging Science and Technology 2022

DOI

10.2352/EI.2022.34.6.IRIACV-265

Volume 34

Issue 6

Abstract

Given a suitable dataset, transfer learning using deep convolutional neural networks is an effective method to develop a system to detect and classify objects. Despite having models pretrained on large general-purpose datasets, the requirement to manually label an application-specific dataset remains a limiting factor in system development. We consider this wider problem in the context of the purity analysis of canola seeds, where end users wish to distinguish species of interest from contaminants in images taken with optical microscopes. We use a Detector network, trained only to detect seeds, to help label the dataset used to train an Analyzer network, capable of both seed detection and classification. We present results, over three experiments that involve 25 contaminant species, including Primary and Secondary Noxious Weed Seeds (as per the Canadian Weed Seeds Order), to validate our incremental approach. We also compare the proposed system to competing ones in a literature review.

Digital Library: EI

Published Online: January 2022

Quantitative analysis of deep learning based multi-target tracking algorithms

66 8

Multi target tracking
Deep learning
Computer vision

Sanam Nisar Mangi, Mohib Ullah, Faouzi Alaya Cheikh

Pages 274-1 - 274-6, January 2022, © Society for Imaging Science and Technology 2022

DOI

10.2352/EI.2022.34.6.IRIACV-274

Volume 34

Issue 6

Abstract

Multi-object tracking is an active computer vision problem that has gained consistent interest due to its wide range of applications in many areas like surveillance, autonomous driving, entertainment, and, gaming to name a few. In the age of deep learning, many computer vision tasks have benefited from the convolutions neural network. They have been optimized with rapid development, whereas multi-target tracking remains challenging. A variety of models have benefited from the representational power of deep learning to tackle this issue. This paper inspects three CNN-based models that have achieved state-of-the-art performance in addressing this problem. All three models follow a different paradigm and provide a key inside of the development of the field. We examined the models and conducted experiments on the three models using the benchmark dataset. The quantitative results from the state-of-the-art models are listed in the standard metrics and provide the basis for future research in the field.

Digital Library: EI

Published Online: January 2022

Leveraging gradient weighted class activation mapping to improve classification effectiveness: Case study in transportation infrastructure characterization

58 5

Active learning
automated labeling
explainability
image analysis

Thomas P. Karnowski, Deniz Aykac, Regina K. Ferrell, Christy Gambrell, Zach Langford, Lauren Torkelson

Pages 275-1 - 275-6, January 2022, © Society for Imaging Science and Technology 2022

DOI

10.2352/EI.2022.34.6.IRIACV-275

Volume 34

Issue 6

Abstract

Roadway â€œcornersâ€ are common for pedestrian use, whether designated with markings or not. Different types of markings have been deployed, ranging from simple parallel lines to more complex designs. Understanding the impact of different types of crosswalks is important for public safety. In this work we explore methods to improve the logging of marked crosswalk types. We used the Roadway Information Database from the Second Strategic Highway Research Project and used active learning methods with transfer learning to identify the crosswalk types (marked or unmarked). Upon completion we found our classifiers were unable to perform above roughly 94% correct classifications. To improve their efficacy, we separated the crosswalks into their â€œfine grainedâ€ types and used Gradient-Weighted Class Activation Mapping to isolate and study the features that classified the crosswalks. We compared this with sampled manually marked crosswalks and present findings. We believe this use case can represent a process to improve the active learning method for some visual machine learning applications.

Digital Library: EI

Published Online: January 2022

Deep learning-based multiple animal pose estimation

132 25

pose estimation
Coco format
data visualization.

Brage Arnkærn, Sigurd Schoeler, Mohib Ullah, Faouzi Alaya Cheikh

DOI

10.2352/EI.2022.34.6.IRIACV-276

Volume 34

Issue 6

Abstract

We proposed a deep learning-based approach for pig keypoint detection. In a nutshell, we explored transfer learning to adapt a human pose estimation model for the pigs. In total, we tested three different models and eventually trained openpose on the pig data. For training, the data is annotated in COCO format. Additionally, we visualized the pixel level response of the network named PAF (part infinity field) on the test frames to highlight the model learning capabilities. The trained model shows promising results and open new a door for further research.

Digital Library: EI

Published Online: January 2022

Efficient landslide detection by UAV-based multi-temporal visual analysis

90 15

landslide detection
SLAM
Direct Sparse Odometry
CNN
UAV
SfM

Yosuke Yamaguchi, Kai Matsui, Jun Ohya, Katsuya Hasegawa, Hiroshi Nagahashi

DOI

10.2352/EI.2022.34.6.IRIACV-307

Volume 34

Issue 6

Abstract

This paper proposes a landslide detection method by UAV-based visual analysis. The fundamental strategy is to detect ground surface elevation changes caused by landslides. Our method consists of five steps: multi-temporal image acquisition, ground surface reconstruction, georeferencing, elevation data export, and landslide detection. In order to improve efficiency, we use Visual Simultaneous Localization and Mapping for ground surface reconstruction. It can perform faster than conventional methods based on Structure-from-Motion. In addition, we introduce convolutional neural network (CNN) to detect landslides robustly in the multi-temporal elevation data. The experimental results in a simulation environment show that the proposed method runs 5.5 times as fast as the conventional methods. In addition, the CNN-based model achieved F1 score of 0.79-0.84, showing robustness against reconstruction noise and registration error.

Digital Library: EI

Published Online: January 2022

Detecting falling rocks by estimating excavation points using single color markers

138 6

Falling rocks
Image processing
RGB-D camera

Rei Kobayashi, Yoshihiro Sato, Masaya Miura, Yuto Osada, Yue Bao

DOI

10.2352/EI.2022.34.6.IRIACV-308

Volume 34

Issue 6