IS&T | Library

Conference Overview and Papers Program

22 1

Intelligent Robots
Industrial Inspection
Computer Vision
Sensing and Imaging Techniques
Sensor Fusion

Pages A06-1 - A06-5, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-A06

Volume 32

Issue 6

Digital Library: EI

Published Online: January 2020

Passive Infrared Markers for Indoor Robotic Positioning and Navigation

284 47

passive markers
infrared
indoor positioning

Jian Chen

Pages 13-1 - 13-6, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-013

Volume 32

Issue 6

By using a new materials system, we developed invisible passive infrared markers that can take on various visual foreground patterns and colors, including white. The material can be coated over many different surfaces such as paper, plastic, wood, metal, and others. Dual-purpose signs are demonstrated where the visual foreground is for human view while the infrared background is for machine view. By hiding digital information in the infrared spectral range, we can enable fiducial markers to enter public spaces without introducing any intrusive visual features for humans. These fiducial markers are robust and easy to detect using off-the-shelf near infrared cameras to assist robot positioning and object identification. This can reduce the barrier for low-cost robots, that are currently deployed in warehouses and factories, to enter offices, stores, and other public spaces and to work alongside with people.

Digital Library: EI

Published Online: January 2020

Improving Multimodal Localization Through Self-Supervision

157 7

Robotics
Deep Learning
Self-supervision
LiDAR
Sensor Fusion
Localization

Robert Relyea, Darshan Bhanushali, Karan Manghi, Abhishek Vashist, Clark Hochgraf, Amlan Ganguly, Andres Kwasinski, Michael E. Kuhl, Raymond Ptucha

Pages 14-1 - 14-8, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-014

Volume 32

Issue 6

Modern warehouses utilize fleets of robots for inventory management. To ensure efficient and safe operation, real-time localization of each agent is essential. Most robots follow metal tracks buried in the floor and use a grid of precisely mounted RFID tags for localization. As robotic agents in warehouses and manufacturing plants become ubiquitous, it would be advantageous to eliminate the need for these metal wires and RFID tags. Not only do they suffer from significant installation costs, the removal of wires would allow agents to travel to any area inside the building. Sensors including cameras and LiDAR have provided meaningful localization information for many different positioning system implementations. Fusing localization features from multiple sensor sources is a challenging task especially when the target localization task’s dataset is small. We propose a deep-learning based localization system which fuses features from an omnidirectional camera image and a 3D LiDAR point cloud to create a robust robot positioning model. Although the usage of vision and LiDAR eliminate the need for the precisely installed RFID tags, they do require the collection and annotation of ground truth training data. Deep neural networks thrive on lots of supervised data, and the collection of this data can be time consuming. Using a dataset collected in a warehouse environment, we evaluate the performance of two individual sensor models for localization accuracy. To minimize the need for extensive ground truth data collection, we introduce a self-supervised pretraining regimen to populate the image feature extraction network with meaningful weights before training on the target localization task with limited data. In this research, we demonstrate how our self-supervision improves accuracy and convergence of localization models without the need for additional sample annotation.

Digital Library: EI

Published Online: January 2020

Creation of a fusion image obtained in different electromagnetic ranges used in industrial robotic systems

37 1

IR images
local features
image processing
change bitrate
robotic systems
different electromagnetic ranges

Evgenii Semenishchev, Voronin Viacheslav, Zelensky Aleksandr, Irina Tolstova

Pages 15-1 - 15-7, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-015

Volume 32

Issue 6

Modern robotic systems allow you to automate processes and increase employee productivity. To create such systems, finite state machines (sensor systems) and machine vision systems are used. The scope of their application may be the development of a robotic system within the framework of the INDUSTRY 4.0 project. The article proposes an approach based on combining data obtained by fusion from cameras working in various electromagnetic ranges. An approach is proposed that is based on a fusion of data on the shape, boundaries, and parameters of objects. The search for the boundaries and shape of objects is based on the layer-bylayer simplification of images with the allocation of local features at each level. The search for local features is based on the allocation of local stationary sections, the allocation of the boundaries of objects, the determination of their parameters and the combination of information in a single information space. The search for boundaries is based on the use of the method of combined image analysis with a joint analysis of the L2 norm criterion. As a measure of a discrepancy, the square of the difference of the input value and the resulting estimate is used. As an example, the results of fusing images based on the combination of infrared data, RGB data, and IR cameras are presented.

Digital Library: EI

Published Online: January 2020

Locating Mechanical Switches Using RGB-D Sensor Mounted on a Disaster Response Robot

190 6

disaster response robot
Object Detection
3D point cloud

Takuya Kanda, Kazuya Miyakawa, Jeonghwang Hayashi, Jun Ohya, Hiroyuki Ogata, Kenji Hashimoto, Xiao Sun, Takashi Matsuzawa, Hiroshi Naito, Atsuo Takanishi

Pages 16-1 - 16-7, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-016

Volume 32

Issue 6

To achieve one of the tasks required for disaster response robots, this paper proposes a method for locating 3D structured switches’ points to be pressed by the robot in disaster sites using RGBD images acquired by Kinect sensor attached to our disaster response robot. Our method consists of the following five steps: 1)Obtain RGB and depth images using an RGB-D sensor. 2) Detect the bounding box of switch area from the RGB image using YOLOv3. 3)Generate 3D point cloud data of the target switch by combining the bounding box and the depth image.4)Detect the center position of the switch button from the RGB image in the bounding box using Convolutional Neural Network (CNN). 5)Estimate the center of the button’s face in real space from the detection result in step 4) and the 3D point cloud data generated in step3) In the experiment, the proposed method is applied to two types of 3D structured switch boxes to evaluate the effectiveness. The results show that our proposed method can locate the switch button accurately enough for the robot operation.

Digital Library: EI

Published Online: January 2020

A Review and Quantitative Evaluation of Small Face Detectors in Deep Learning

51 5

Small Face Detector
Review
Comparative Evaluation
Deep Learning

Hua Wu, Shuang Yang, Weihua Xiong, Shanhu Yu,Xinsheng Sun, Tongqi Wei

Pages 48-1 - 48-8, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-048

Volume 32

Issue 6

Face detection is crucial to computer vision and many similar applications. Past decades have witnessed great progress in solving this problem. Contrary to traditional methods, recently many researchers have proposed a variety of CNN(Convolutional Neural Network) methods and have given out impressive results in diverse ways. Although many comprehensive evaluations or reviews about face detection are available, very few focuses on small face detection strategies. In this paper, we systematically survey some of the prevailing methods; divide them into two categories and compare them qualitatively on three real-world image data sets in terms of mAP. The experimental results show that feature pyramid with multiple predictors can produce better performance, which is helpful in future direction of research work.

Digital Library: EI

Published Online: January 2020

Rare-Class Extraction Using Cascaded Pretrained Networks Applied to Crane Classification

164 2

Convolutional neural network
Classification
Crane classification
Data acquisition
Surveillance
Automatic data labeling

Sander R. Klomp, Guido M.Y.E. Brouwers, Rob G.J. Wijnhoven, Peter H.N. de With

Pages 49-1 - 49-7, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-049

Volume 32

Issue 6

Overweight vehicles are a common source of pavement and bridge damage. Especially mobile crane vehicles are often beyond legal per-axle weight limits, carrying their lifting blocks and ballast on the vehicle instead of on a separate trailer. To prevent road deterioration, the detection of overweight cranes is desirable for law enforcement. As the source of crane weight is visible, we propose a camera-based detection system based on convolutional neural networks. We iteratively label our dataset to vastly reduce labeling and extensively investigate the impact of image resolution, network depth and dataset size to choose optimal parameters during iterative labeling. We show that iterative labeling with intelligently chosen image resolutions and network depths can vastly improve (up to 70×) the speed at which data can be labeled, to train classification systems for practical surveillance applications. The experiments provide an estimate of the optimal amount of data required to train an effective classification system, which is valuable for classification problems in general. The proposed system achieves an AUC score of 0.985 for distinguishing cranes from other vehicles and an AUC of 0.92 and 0.77 on lifting block and ballast classification, respectively. The proposed classification system enables effective road monitoring for semi-automatic law enforcement and is attractive for rare-class extraction in general surveillance classification problems.

Digital Library: EI

Published Online: January 2020

Detection and Characterization of Rumble Strips in Roadway Video Logs

48 3

Computer vision
Autonomy
Transportation Systems
Deep Learning
Segmentation

Deniz Aykac, Thomas Karnowski, Regina Ferrell, James S. Goddard

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-050

Volume 32

Issue 6

State departments of transportation often maintain extensive “video logs” of their roadways that include signs, lane markings, as well as non-image-based information such as grade, curvature, etc. In this work we use the Roadway Information Database (RID), developed for the Second Strategic Highway Research Program, as a surrogate for a video log to design and test algorithms to detect rumble strips in the roadway images. Rumble strips are grooved patterns at the lane extremities designed to produce an audible queue to drivers who are in danger of lane departure. The RID contains 6,203,576 images of roads in six locations across the United States with extensive ground truth information and measurements, but the rumble strip measurements (length and spacing) were not recorded. We use an image correction process along with automated feature extraction and convolutional neural networks to detect rumble strip locations and measure their length and pitch. Based on independent measurements, we estimate our true positive rate to be 93% and false positive rate to be 10% with errors in length and spacing on the order of 0.09 meters RMS and 0.04 meters RMS. Our results illustrate the feasibility of this approach to add value to video logs after initial capture as well as identify potential methods for autonomous navigation.

Digital Library: EI

Published Online: January 2020

Perceptual License Plate Super-Resolution with CTC Loss

258 11

super-resolution
general adversarial networks
optical character recognition

Zuzana Bílková, Michal Hradiš

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-052

Volume 32

Issue 6

We present a novel method for super-resolution (SR) of license plate images based on an end-to-end convolutional neural networks (CNN) combining generative adversial networks (GANs) and optical character recognition (OCR). License plate SR systems play an important role in number of security applications such as improvement of road safety, traffic monitoring or surveillance. The specific task requires not only realistic-looking reconstructed images but it also needs to preserve the text information. Standard CNN SR and GANs fail to accomplish this requirment. The incorporation of the OCR pipeline into the method also allows training of the network without the need of ground truth high resolution data which enables easy training on real data with all the real image degradations including compression.

Digital Library: EI

Published Online: January 2020

Estimating Vehicle Fuel Economy from Overhead Camera Imagery and Application for Traffic Control

71 11

computer vision
segmentation
clean energy
reinforcement learning

Thomas Karnowski, Ryan Tokola, Sean Oesch, Matthew Eicholtz, Jeff Price, Tim Gee

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-070

Volume 32

Issue 6