IS&T | Library

Digital Library: EI

Published Online: January 2018

CrossEncoders: A complex neural network compression framework

69 2

IMAGE COMPRESSION
CROSSENCODERS
DIMENSIONALITY REDUCTION
ROBUSTNESS

Chirag Agarwal, Mehdi Sharifzadeh, Dan Schonfeld

Pages 153-1 - 153-5, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-153

Volume 30

Issue 2

We propose a novel architecture based on the strucuture of AutoEncoders. The paper introduces CrossEncoders - an AutoEncoder architecture which uses cross-connections to connect layers (both adjacent and non-adjacent) in the encoder and decoder side of the network respectively. The network incorporates both global and local information in the lower dimension code. We aim for an image compression algorithm that has reduced training time and better generalization property. The use of cross-connections makes the training of our network significantly faster. The performance of the proposed framework has been evaluated using real-world data from highly competitive datasets like MNIST and CIFAR-10. Furthermore, we show that the proposed architecture provides high compression ratio and is robust as compared to previously proposed architectures and PCA. The results were validated using metrics, such as PSNR-HVS and PSNR-HVS-M respectively.

Digital Library: EI

Published Online: January 2018

Multi-Level Machine Learning-based Early Termination in VP9 Partition Search

48 3

VIDEO CODEC
MULTI-LEVEL
MACHINE LEARNING
EARLY TERMINATION
ERROR CONTROL

Yang Xian, Yunqing Wang, Yingli Tian, Yaowu Xu, Jim Bankoski

Pages 154-1 - 154-5, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-154

Volume 30

Issue 2

In VP9 , a 64×64 superblock can be recursively decomposed all the way to blocks of size 4×4 . The encoder performs the encoding process for each possible partitioning and the optimal one is selected by minimizing the rate and distortion cost. This scheme ensures the encoding quality, but also brings in large computational complexity and substantial CPU resources. In this paper, to speed up the partition search without sacrificing the quality, we propose a multi-level machine learning-based early termination scheme. One weighted Support Vector Machine classifier is trained for each block size. The binary classifiers are used to determine that provided a block, whether it is necessary to continue the search down to smaller blocks, or to perform the early termination and take the current block size as the final one. Moreover, the classifiers are trained with varying error-tolerance for different block sizes, i.e., a stricter error-tolerance is adopted for larger block size compared with the smaller ones to control the encoder performance drop. Extensive experimental results demonstrate that for HD and 4K videos, the proposed framework accomplishes remarkable speed-up (20-25%) with less than 0.03% performance drop measured in the Bjøntegaard delta bit rate (BDBR) compared with current VP9 codebase.

Digital Library: EI

Published Online: January 2018

Texture Segmentation Based Video Compression Using Convolutional Neural Networks

200 1

CODING EFFICIENCY
TEXTURE SEGMENTATION
CONVOLUTIONAL NEURAL NETWORKS
VIDEO CODING
AOM/AV1

Chichen Fu, Di Chen, Edward Delp, Zoe Liu, Fengqing Zhu

Pages 155-1 - 155-6, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-155

Volume 30

Issue 2

There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. In this paper, we propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality.

Digital Library: EI

Published Online: January 2018

Multi-Reference Video Coding Using Stillness Detection

196 3

VIDEO CODING
WEBM
AOM
AV1
MULTI-REFERENCE PREDICTION
ADAPTIVE CODING STRUCTURE
STILLNESS DETECTION

Di Chen, Zoe Liu, Yaowu Xu, Fengqing Zhu, Edward Delp

Pages 156-1 - 156-4, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-156

Volume 30

Issue 2

Encoders of AOM/AV1 codec consider an input video sequence as succession of frames grouped in Golden-Frame (GF) groups. The coding structure of a GF group is fixed with a given GF group size. In the current AOM/AV1 encoder, video frames are coded using a hierarchical, multilayer coding structure within one GF group. It has been observed that the use of multilayer coding structure may result in worse coding performance if the GF group presents consistent stillness across its frames. This paper proposes a new approach that adaptively designs the Golden-Frame (GF) group coding structure through the use of stillness detection. Our new approach hence develops an automatic stillness detection scheme using three metrics extracted from each GF group. It then differentiates those GF groups of stillness from other non-still GF groups and uses different GF coding structures accordingly. Experimental result demonstrates a consistent coding gain using the new approach.

Digital Library: EI

Published Online: January 2018

Event Recognition in Personal Photo Collections: An Active Learning Approach

156 0

EVENT RECOGNITION
ACTIVE LEARNING
MULTIMEDIA INDEXING
MULTIMEDIA RETRIEVAL

Kashif Ahmad, Mohamed Lamine Mekhalfi, Nicola Conci

Pages 173-1 - 173-5, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-173

Volume 30

Issue 2

In this paper, we propose an active learning based approach to event recognition in personal photo collections to tackle the challenges due to the weakly labeled data, and the presence of irrelevant pictures in personal photo collections. Conventional approaches relying on supervised learning can not identify the relevant samples in training albums, often leading to wrong classification. In our work, we aim to utilize the concepts of active learning to choose the most relevant samples from a collection and train a classifier. We also investigate the importance of relevant images in the event recognition process, and show how the performance degrades if all images from an album, containing the irrelevant ones, are included in the process. The experimental evaluation is carried out on a benchmark dataset composed of a large number of personal photo albums. We demonstrate that the proposed strategy yields encouraging scores in the presence of irrelevant images in personal photo collections, advancing recent leading works.

Digital Library: EI

Published Online: January 2018

Generative Adversarial Networks for Open Set Historical Chinese Character Recognition

64 9

CHARACTER RECOGNITION
GENERATIVE ADVERSARIAL NETWORK
OPEN SET RECOGNITION
HISTORICAL CHINESE

Xiaoyi Yu, Jun Sun, Satoshi Naoi

Pages 174-1 - 174-5, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-174

Volume 30

Issue 2

Historical Chinese character recognition has been suffering from the problem of samples labeling, not only the problem of lacking sufficient labeled training samples, but also of sample classes. So the scenario for Historical Chinese character recognition is "open set" recognition, where incomplete labeling of sample classes is present at training time, and unknown classes can be submitted to the system during testing. This paper proposes a method for open set Historical Chinese Character Recognition. For open set recognition, the features available in the training data cannot effectively characterize different kinds of unknown classes. We assume that the features which characterize unknown classes can be derived or learned from other similar data sets. We utilize an auxiliary data set combined with the open set training data set to learn good features to represent historical Chinese characters. The auxiliary data set is translated using Generative Adversarial Networks (GAN) to make sure that the translated data set is as close to the historical Chinese character dataset as possible. Then we construct a neural network for features extraction. The neural network is trained using an alternative training method with the translated auxiliary dataset and incomplete labeled historical Chinese character data set. Last, features are extracted from certain layer of the trained neural network. Unknown samples are detected using statistical modelling of the Euclidean metric between samples. Experimental results show that the proposed method is effective.

Digital Library: EI

Published Online: January 2018

Transfer Learning for Data Triage Applications

176 1

CONTENT-BASED IMAGE RETRIEVAL
CBIR
TRANSFER LEARNING
CONVOLUTIONAL NEURAL NETWORKS
DEEP LEARNING

Felix Mayer, Marcel Schäfer, Martin Steinebach

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-175

Volume 30

Issue 2

Convolutional neural networks (CNNs) have improved the field of computer vision in the past years and allow groundbreaking new and fast automatic results in various scenarios. However, the training effect of CNNs when only scarce data are available is not yet examined in detail. Transfer learning is a technique that helps overcoming training data shortage by adapting trained models to a different but related target task. We investigate the transfer learning performance of pre-trained CNN models on variably sized training datasets for binary classification problems, which resemble the discrimination between relevant and irrelevant content within a restricted context. This often plays a role in data triage applications such as screening seized storage devices for means of evidence. The evaluation of our work shows that even with a small number of training examples, the models can achieve promising performances of up to 96% accuracy. We apply those transferred models to data triage by using the softmax outputs of the models to rank unseen images according to their assigned probability of relevance. This provides a tremendous advantage in many application scenarios where large unordered datasets have to be screened for certain content.

Digital Library: EI

Published Online: January 2018

Approach for Machine-Printed Arabic Character Recognition: the-state-of-the-art deep-learning method

56 6

DEEP-LEARNING
LONG SHORT-TERM MEMORY
CONNECTIONIST TEMPORAL CLASSIFICATION
TESSERACT
ARABIC CHARACTER RECOGNITION
OCR PERFORMANCE

Daegun Ko, Changhyung Lee, Donghyeop Han, Hyeongsu Ohk, Kimin Kang, Seongwook Han

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-176

Volume 30

Issue 2

Optical character recognition (OCR) automatically recognizes texts in an image and converts them into machine codes such as ASCII or Unicode. Compared to many research studied on OCR for other languages, recognizing Arabic language is still a challenging problem due to character connection and segmentation issues. In this work, we propose a deep-learning framework of recognizing Arabic characters based on the multi-dimensional bi-direction long short-term memory (MD-BLSTM) with connectionist temporal classification (CTC). To train this framework, we generate over one-million Arabic text-line images dataset that contains Arabic digits, basic Arabic forms with isolated shape and connected forms. To compare the results, we also measure the performance of other OCR software such as Tesseract made by Hewlett-Packard and Google Inc. Tesseract version 3 and version 4 are used. Results show that deep-learning method outperforms the conventional methods in terms of recognition error rate, although the Tesseract_3.0 system was faster.

Digital Library: EI

Published Online: January 2018

Toward Automatic and Objective Evaluation of Synchronization in Synchronized Diving Video

61 1

BACKGROUND ESTIMATION
SYNCHRONIZATION ANALYSIS
DIVING VIDEO

Yixin Du, Xin Li

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-205

Volume 30

Issue 2