IS&T | Library

Conference Overview and Papers Program

24 0

Multimedia Analysis
Machine Learning
Mobile
Imaging
Web

Pages A08-1 - A08-6, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-A08

Volume 32

Issue 8

Digital Library: EI

Published Online: January 2020

A New Training Model for Object Detection in Aerial Images

167 7

Aerial image analysis
Deep learning
RetinaNet
Feature extraction
Object recognition

Geng YANG, Yu Geng, Qin LI, Jane YOU, Mingpeng Cai

Pages 84-1 - 84-5, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-084

Volume 32

Issue 8

This paper presents a new training model for orientation invariant object detection in aerial images by extending a deep learning based RetinaNet which is a single-stage detector based on feature pyramid networks and focal loss for dense object detection. Unlike R3Det which applies feature refinement to handle rotating objects, we proposed further improvement to cope with the densely arranged and class imbalance problems in aerial imaging, on three aspects: 1) All training images are traversed in each iteration instead of only one image in each iteration in order to cover all possibilities; 2) The learning rate is reduced if losses are not reduced; and 3) The learning rate is reduced if losses are not changed. The proposed method was calibrated and validated by comprehensive for performance evaluation and benchmarking. The experiment results demonstrate the significant improvement in comparison with R3Dec approach on the same data set. In addition to the well-known public data set DOTA for benchmarking, a new data set is also established by considering the balance between the training set and testing set. The map of losses which dropped down smoothly without jitter and overfitting also illustrates the advantages of the proposed newmodel.

Digital Library: EI

Published Online: January 2020

Small Object Bird Detection in Infrared Drone Videos Using Mask R-CNN Deep Learning

404 94

deep learning
small object detection
wildlife survey
bird population

Yasmin M. Kassim, Michael E. Byrne, Cristy Burch, Kevin Mote, Jason Hardin, David R. Larsen, Kannappan Palaniappan

Pages 85-1 - 85-8, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-085

Volume 32

Issue 8

A reliable method to estimate population sizes of wild turkeys (Meleagris gallopavo) using unmanned aerial vehicles and thermal video imaging data collected at several field sites in Texas is described. Automating the data processing of airborne survey videos provides a fast and reproducible way to count wild turkeys for wildlife management and conservation. A deep learning semantic segmentation pipeline is developed to detect and count roosting Rio-Grande wild turkeys (M.g. intermedia) which appear as small faint objects in drone-based thermal IR videos. The proposed approach to detect roosting turkeys that appear as small objects, relies on Mask R-CNN, a deep architecture semantic segmentation algorithm. This is followed by a post-processing data association and filtering (DAF) process for counting the number of roosting birds. DAF was used to eliminate false positives like rocks and other small bright objects, which often have noisy detections across temporally adjacent video frames, that can be filtered using appearance association and distance-based gating across time. Transfer learning was used to train the Mask R-CNN network by initializing using ImageNet weights. Drone-based thermal IR videos are extremely challenging due to the complexity of the natural environment including weather effects, occlusion of birds, terrain, trees, complex tree shapes, rocks, water and thermal inversion. The transect videos were collected at night at several times and altitudes to optimize data collection opportunities without disturbing the roosting turkeys. Preliminary performance evaluation using 280 video frames is promising.

Digital Library: EI

Published Online: January 2020

High-quality multispectral image generation using Conditional GANs

105 21

remote sensing
image-to-image translation
conditional image generation
conditional GANs
multispectral imagery
synthetic imagery

Ayush Soni, Alexander Loui, Scott Brown, Carl Salvaggio

Pages 86-1 - 86-7, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-086

Volume 32

Issue 8

In this paper, we demonstrate the use of a Conditional Generative Adversarial Networks (cGAN) framework for producing high-fidelity, multispectral aerial imagery using low-fidelity imagery of the same kind as input. The motivation behind is that it is easier, faster, and often less costly to produce low-fidelity images than high-fidelity images using the various available techniques, such as physics-driven synthetic image generation models. Once the cGAN network is trained and tuned in a supervised manner on a data set of paired low- and high-quality aerial images, it can then be used to enhance new, lower-quality baseline images of similar type to produce more realistic, high-fidelity multispectral image data. This approach can potentially save significant time and effort compared to traditional approaches of producing multispectral images.

Digital Library: EI

Published Online: January 2020

LambdaNet: A Fully Convolutional Architecture for Directional Change Detection

79 8

Change Detection
Siamese Network
Fully Convolutional Network
Semantic Segmentation

Bryan Blakeslee, Andreas Savakis

Pages 114-1 - 114-7, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-114

Volume 32

Issue 8

Change detection in image pairs has traditionally been a binary process, reporting either “Change” or “No Change.” In this paper, we present LambdaNet, a novel deep architecture for performing pixel-level directional change detection based on a four class classification scheme. LambdaNet successfully incorporates the notion of “directional change” and identifies differences between two images as “Additive Change” when a new object appears, “Subtractive Change” when an object is removed, “Exchange” when different objects are present in the same location, and “No Change.” To obtain pixel annotated change maps for training, we generated directional change class labels for the Change Detection 2014 dataset. Our tests illustrate that LambdaNet would be suitable for situations where the type of change is unstructured, such as change detection scenarios in satellite imagery.

Digital Library: EI

Published Online: January 2020

Deep Learning for Printed Mottle Defect Grading

87 7

Printed Defect
Image Classification
Deep learning

Jianhang Chen, Qian Lin, Jan P. Allebach

Pages 184-1 - 184-9, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-184

Volume 32

Issue 8

In this paper, we propose a new method for printed mottle defect grading. By training the data scanned from printed images, our deep learning method based on a Convolutional Neural Network (CNN) can classify various images with different mottle defect levels. Different from traditional methods to extract the image features, our method utilizes a CNN for the first time to extract the features automatically without manual feature design. Different data augmentation methods such as rotation, flip, zoom, and shift are also applied to the original dataset. The final network is trained by transfer learning using the ResNet-34 network pretrained on the ImageNet dataset connected with fully connected layers. The experimental results show that our approach leads to a 13.16% error rate in the T dataset, which is a dataset with a single image content, and a 20.73% error rate in a combined dataset with different contents.

Digital Library: EI

Published Online: January 2020

A Local-Global Aggregate Network for Facial Landmark Localization

84 2

facial landmark localization
face alignment
convolutional neural network

Ruiyi Mao, Qian Lin, Jan P. Allebach

Pages 185-1 - 185-6, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-185

Volume 32

Issue 8

Facial landmark localization plays a critical role in many face analysis tasks. In this paper, we present a novel local-global aggregate network (LGA-Net) for robust facial landmark localization of faces in the wild. The network consists of two convolutional neural network levels which aggregate local and global information for better prediction accuracy and robustness. Experimental results show our method overcomes typical problems of cascaded networks and outperforms state-of-the-art methods on the 300-W [1] benchmark.

Digital Library: EI

Published Online: January 2020

The Blessing and the Curse of the Noise behind Facial Landmark Annotations

45 1

face alignment
facial landmark
deep neural network
face tracking

Xiaoyu Xiang, Yang Cheng, Shaoyuan Xu, Qian Lin, Jan Allebach

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-186

Volume 32

Issue 8

The evolving algorithms for 2D facial landmark detection empower people to recognize faces, analyze facial expressions, etc. However, existing methods still encounter problems of unstable facial landmarks when applied to videos. Because previous research shows that the instability of facial landmarks is caused by the inconsistency of labeling quality among the public datasets, we want to have a better understanding of the influence of annotation noise in them. In this paper, we make the following contributions: 1) we propose two metrics that quantitatively measure the stability of detected facial landmarks, 2) we model the annotation noise in an existing public dataset, 3) we investigate the influence of different types of noise in training face alignment neural networks, and propose corresponding solutions. Our results demonstrate improvements in both accuracy and stability of detected facial landmarks.

Digital Library: EI

Published Online: January 2020

Gun Source and Muzzle Head Detection

202 5

Gunshot Detection
Muzzlehead Detection
Mask RCNN
Video
Optical Flow

Zhong Zhou, Isak Czeresnia Etinger, Florian Metze, Alexander Hauptmann, Alexander Waibel

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-187

Volume 32

Issue 8

There is a surging need across the world for protection against gun violence. There are three main areas that we have identified as challenging in research that tries to curb gun violence: temporal location of gunshots, gun type prediction and gun source (shooter) detection. Our task is gun source detection and muzzle head detection, where the muzzle head is the round opening of the firing end of the gun. We would like to locate the muzzle head of the gun in the video visually, and identify who has fired the shot. In our formulation, we turn the problem of muzzle head detection into two sub-problems of human object detection and gun smoke detection. Our assumption is that the muzzle head typically lies between the gun smoke caused by the shot and the shooter. We have interesting results both in bounding the shooter as well as detecting the gun smoke. In our experiments, we are successful in detecting the muzzle head by detecting the gun smoke and the shooter.

Digital Library: EI

Published Online: January 2020

Semi-supervised Multi-task Network For Image Aesthetic Assessment

110 7

image aesthetic assessment
deep neural network
multi-task network
semi-supervised method

Xiaoyu Xiang, Yang Cheng, Jianhang Chen, Qian Lin, Jan Allebach

DOI

10.2352/ISSN.2470-1173.2020.8.IMAWM-188

Volume 32

Issue 8