IS&T | Library

Person Segmentation Using Convolutional Neural Networks With Dilated Convolutions

30 4

PERSON SEGMENTATION
CONVOLUTIONAL NEURAL NETWORKS
DILATED CONVOLUTIONS
SEMANTIC SEGMENTATION
DATA AUGMENTATION

David Joon Ho, Qian Lin

Pages 455-1 - 455-7, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.10.IMAWM-455

Volume 30

Issue 10

Semantic segmentation, classifying each pixel in an image to a set of various objects, is an important and necessary problem to understand images. In recent years, convolutional neural networks trained with public datasets enable to segment objects and understand images. However, it is still challenging to segment objects with high accuracy on a simple and small network. In this work, we describe convolutional neural networks with dilated convolutions to segment person accurately especially near boundary using data augmentation technique. Additionally, we develop a smaller network which can run each frame in webcam video faster without degrading segmentation performance. Our method both numerically and visually outperforms other segmentation techniques.

Digital Library: EI

Published Online: January 2018

Texture Segmentation Based Video Compression Using Convolutional Neural Networks

199 1

CODING EFFICIENCY
TEXTURE SEGMENTATION
CONVOLUTIONAL NEURAL NETWORKS
VIDEO CODING
AOM/AV1

Chichen Fu, Di Chen, Edward Delp, Zoe Liu, Fengqing Zhu

Pages 155-1 - 155-6, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-155

Volume 30

Issue 2

There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. In this paper, we propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality.

Digital Library: EI

Published Online: January 2018

Transfer Learning for Data Triage Applications

175 1

CONTENT-BASED IMAGE RETRIEVAL
CBIR
TRANSFER LEARNING
CONVOLUTIONAL NEURAL NETWORKS
DEEP LEARNING

Felix Mayer, Marcel Schäfer, Martin Steinebach

Pages 175-1 - 175-6, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-175

Volume 30

Issue 2

Convolutional neural networks (CNNs) have improved the field of computer vision in the past years and allow groundbreaking new and fast automatic results in various scenarios. However, the training effect of CNNs when only scarce data are available is not yet examined in detail. Transfer learning is a technique that helps overcoming training data shortage by adapting trained models to a different but related target task. We investigate the transfer learning performance of pre-trained CNN models on variably sized training datasets for binary classification problems, which resemble the discrimination between relevant and irrelevant content within a restricted context. This often plays a role in data triage applications such as screening seized storage devices for means of evidence. The evaluation of our work shows that even with a small number of training examples, the models can achieve promising performances of up to 96% accuracy. We apply those transferred models to data triage by using the softmax outputs of the models to rank unseen images according to their assigned probability of relevance. This provides a tremendous advantage in many application scenarios where large unordered datasets have to be screened for certain content.

Digital Library: EI

Published Online: January 2018

Convolutional neural networks for the analysis of broadcasted tennis games

47 3

SPORT ACTIVITIES
DEEP LEARNING
CONVOLUTIONAL NEURAL NETWORKS
VIDEO ANALYSIS

Grigorios Tsagkatakis, Mustafa Jaber, Panagiotis Tsakalides

Pages 206-1 - 206-6, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-206

Volume 30

Issue 2

The analysis of complex structured data like video has been a long-standing challenge for computer vision algorithms. Innovative deep learning architectures like Convolutional Neural Networks (CNNs), however are demonstrating remarkable performance in challenging image and video understanding tasks. In this work we propose a architecture for the automated detection of scored points during tennis matches. We explore two approaches based on CNNs for the analysis of video streams of broadcasted tennis games. We first explore the two-stream approach, which involves extracting features related to either pixel intensity values via the analysis of grayscale frames or the encoding of motion related information via optical flow. However, we explore the case of using higher order 3D CNN for simultaneously encoding both spatial and temporal correlations. Furthermore, we explore the late fusion of the individual stream in order to extract and encode both structural and motion spatio-temporal dynamics. We validate the merits of the proposed scheme using a novel manually annotated dataset created from publically available videos.

Digital Library: EI

Published Online: January 2018

Deep Convolutional Neural Networks for the Classification of Snapshot Mosaic Hyperspectral Imagery

110 10

HYPERSPECTRAL IMAGES
CLASSIFICATION
DEEP LEARNING
CONVOLUTIONAL NEURAL NETWORKS

Konstantina Fotiadou, Grigorios Tsagkatakis, Panagiotis Tsakalides

Pages 185 - 190, January 2017, © Society for Imaging Science and Technology 2017

DOI

10.2352/ISSN.2470-1173.2017.17.COIMG-445

Volume 29

Issue 17

Spectral information obtained by hyperspectral sensors enables better characterization, identification and classification of the objects in a scene of interest. Unfortunately, several factors have to be addressed in the classification of hyperspectral data, including the acquisition process, the high dimensionality of spectral samples, and the limited availability of labeled data. Consequently, it is of great importance to design hyperspectral image classification schemes able to deal with the issues of the curse of dimensionality, and simultaneously produce accurate classification results, even from a limited number of training data. To that end, we propose a novel machine learning technique that addresses the hyperspectral image classification problem by employing the state-of-the-art scheme of Convolutional Neural Networks (CNNs). The formal approach introduced in this work exploits the fact that the spatio-spectral information of an input scene can be encoded via CNNs and combined with multi-class classifiers. We apply the proposed method on novel dataset acquired by a snapshot mosaic spectral camera and demonstrate the potential of the proposed approach for accurate classification.

Digital Library: EI

Published Online: January 2017

Capacity limits and how the visual system copes with them

182 46

PERIPHERAL VISION
ATTENTION
VISUALIZATION
CONVOLUTIONAL NEURAL NETWORKS
FOCUS TOPIC: NOVEL TECHNIQUES FOR PROBING VISION

Ruth Rosenholtz

Pages 8 - 23, January 2017, © Society for Imaging Science and Technology 2017

DOI

10.2352/ISSN.2470-1173.2017.14.HVEI-111

Volume 29

Issue 14

A visual system cannot process everything with full fidelity, nor, in a given moment, perform all possible visual tasks. Rather, it must lose some information, and prioritize some tasks over others. The human visual system has developed a number of strategies for dealing with its limited capacity. This paper reviews recent evidence for one strategy: encoding the visual input in terms of a rich set of local image statistics, where the local regions grow — and the representation becomes less precise — with distance from fixation. The explanatory power of this proposed encoding scheme has implications for another proposed strategy for dealing with limited capacity: that of selective attention, which gates visual processing so that the visual system momentarily processes some objects, features, or locations at the expense of others. A lossy peripheral encoding offers an alternative explanation for a number of phenomena used to study selective attention. Based on lessons learned from studying peripheral vision, this paper proposes a different characterization of capacity limits as limits on decision complexity. A general-purpose decision process may deal with such limits by "cutting corners" when the task becomes too complicated.

Digital Library: EI

Published Online: January 2017

Methods and measurements to compare men against machines

276 20

CONVOLUTIONAL NEURAL NETWORKS
DEEP NEURAL NETWORKS
PSYCHOPHYSICS
HUMAN VISION
METHODS

Felix A. Wichmann, David H. J. Janssen, Robert Geirhos, Guillermo Aguilar, Heiko H. Schütt, Marianne Maertens, Matthias Bethge

DOI

10.2352/ISSN.2470-1173.2017.14.HVEI-113

Volume 29

Issue 14

Recent advances in computational models in vision science have considerably furthered our understanding of human visual perception. At the same time, rapid advances in convolutional deep neural networks (DNNs) have resulted in computer vision models of object recognition which, for the first time, rival human object recognition. Furthermore, it has been suggested that DNNs may not only be successful models for computer vision, but may also be good computational models of the monkey and human visual systems. The advances in computational models in both vision science and computer vision pose two challenges in two different and independent domains: First, because the latest computational models have much higher predictive accuracy, and competing models may make similar predictions, we require more human data to be able to statistically distinguish between different models. Thus we would like to have methods to acquire trustworthy human behavioural data fast and easy. Second, we need challenging experiments to ascertain whether models show similar input-output behaviour only near "ceiling" performance, or whether their performance degrades similar to human performance: only then do we have strong evidence that models and human observers may be using similar features and processing strategies. In this paper we address both challenges.

Digital Library: EI

Published Online: January 2017

Defining Self-Similarity of Images Using Features Learned by Convolutional Neural Networks

135 7

HUMAN VISION
IMAGE STRUCTURE
PSYCHOPHYSICS
AESTHETICS
VISUAL ART
CONVOLUTIONAL NEURAL NETWORKS

Anselm Brachmann, Christoph Redies

DOI

10.2352/ISSN.2470-1173.2017.14.HVEI-142

Volume 29

Issue 14

In recent years, Convolutional Neural Networks (CNNs) have gained huge popularity among computer vision researchers. In this paper, we investigate how features learned by these networks in a supervised manner can be used to define a measure of self-similarity, an image feature that characterizes many images of natural scenes and patterns, and is also associated with images of artworks. Compared to a previously proposed method for measuring self-similarity based on oriented luminance gradients, our approach has two advantages. Firstly, we fully take color into account, an image feature which is crucial for vision. Secondly, by using higher-layer CNN features, we define a measure of selfsimilarity that relies more on image content than on basic local image features, such as luminance gradients.

Digital Library: EI

Published Online: January 2017

Image Quality Assessment by Comparing CNN Features between Images

199 23

OBJECTIVE IMAGE QUALITY METRIC
CONVOLUTIONAL NEURAL NETWORKS
ALEXNET MODEL

Seyed Ali Amirshahi, Marius Pedersen, Stella X. Yu

DOI

10.2352/ISSN.2470-1173.2017.12.IQSP-225

Volume 29

Issue 12

Finding an objective image quality metric that matches the subjective quality has always been a challenging task. We propose a new full reference image quality metric based on features extracted from Convolutional Neural Networks (CNNs). Using a pre-trained AlexNet model, we extract feature maps of the test and reference images at multiple layers, and compare their feature similarity at each layer. Such similarity scores are then pooled across layers to obtain an overall quality value. Experimental results on four state-of-the-art databases show that our metric is either on par or outperforms 10 other state-of-the-art metrics, demonstrating that CNN features at multiple levels are superior to handcrafted features used in most image quality metrics in capturing aspects that matter for discriminative perception. © 2016 Society for Imaging Science and Technology.

Digital Library: EI

Published Online: January 2017

Distracted Driver Detection: Deep Learning vs Handcrafted Features

236 56

DEEP LEARNING
CONVOLUTIONAL NEURAL NETWORKS
SUPPORT VECTOR MACHINE (SVM)
DISTRACTED DRIVERS

Murtadha D Hssayeni, Sagar Saxena, Raymond Ptucha, Andreas Savakis

DOI

10.2352/ISSN.2470-1173.2017.10.IMAWM-162

Volume 29

Issue 10