IS&T | Library

Machine Learning and IIIF in the Reality Check of Daily Digitization Projects using the Example of the Goobi Community

44 14

MACHINE LEARNING
SOFTWARE DEVELOPMENT
OPEN SOURCE
COMMUNITY

Steffen Hankiewicz, Oliver Paetzel

Pages 72 - 78, April 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/issn.2168-3204.2020.1.0.72

Volume 17

Issue 1

Digital imaging, as an archival practice, is not a "solved problem" for the cultural heritage community. As Google, publishers, and other content providers digitize and deliver resources at scale, there is an increasingly pressing demand from users to digitize the rich resources in library special collections, archival institutions, and the vast array of invaluable content in private collections. This paper introduces a research and learning initiative (Dig4E-Digitization for Everybody) designed to bridge the knowledge gap that presently exists between well-established or emergent international standards derived from imaging science, on the one hand, and local practices for digital reformatting of archival resources. The paper describes the rationale for the education and training initiative and summarizes the intellectual structure and the technical platform of an innovative sequence of self-paced online resources that can be adapted for a variety of audiences.

Digital Library: ARCHIVING

Published Online: April 2020

SLADS-Net: Supervised Learning Approach for Dynamic Sampling using Deep Neural Networks

256 16

DYNAMIC SAMPLING
SCANNING ELECTRON MICROSCOPY
MACHINE LEARNING
NEURAL NETWORKS

Yan Zhang, G. M. Dilshan Godaliyadda, Nicola Ferrier, Emine B. Gulsoy, Charles A. Bouman, Charudatta Phatak

Pages 131-1 - 1316, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.15.COIMG-131

Volume 30

Issue 15

In scanning microscopy based imaging techniques, there is a need to develop novel data acquisition schemes that can reduce the time for data acquisition and minimize sample exposure to the probing radiation. Sparse sampling schemes are ideally suited for such applications where the images can be reconstructed from a sparse set of measurements. In particular, dynamic sparse sampling based on supervised learning has shown promising results for practical applications. However, a particular drawback of such methods is that it requires training image sets with similar information content which may not always be available. In this paper, we introduce a Supervised Learning Approach for Dynamic Sampling (SLADS) algorithm that uses a deep neural network based training approach. We call this algorithm SLADS-Net. We have performed simulated experiments for dynamic sampling using SLADS-Net in which the training images either have similar information content or completely different information content, when compared to the testing images. We compare the performance across various methods for training such as least-squares, support vector regression and deep neural networks. From these results we observe that deep neural network based training results in superior performance when the training and testing images are not similar. We also discuss the development of a pre-trained SLADS-Net that uses generic images for training. Here, the neural network parameters are pre-trained so that users can directly apply SLADS-Net for imaging experiments.

Digital Library: EI

Published Online: January 2018

Advantages of Incorporating Perceptual Component Models into a Machine Learning framework for Prediction of Display Quality

46 5

HIGH DYNAMIC RANGE
MACHINE LEARNING
SUPPORT VECTOR MACHINE
PERCEPTUAL TRANSFORMS
EXTRAPOLATION
VISUAL QUALITY
DISPLAY QUALITY
WIDE COLOR GAMUT

Anustup Choudhury, Scott Daly

Pages 299-1 - 299-6, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.12.IQSP-299

Volume 30

Issue 12

IMAWM Conference Overview and Papers Program

Recent work in prediction of overall HDR and WCG display quality has shown that machine learning approaches based on physical measurements performs on par with more advanced perceptually transformed measurements. While combining machine learning with the perceptual transforms did improve over using each technique separately, the improvement was minor. However, that work did not explore how well these models performed when applied to display capabilities outside of the training data set. This new work examines what happens when the machinelearning approaches are used to predict quality outside of the training set, both in terms of extrapolation and interpolation. While doing so, we consider two models – one based on physical display characteristics, and a perceptual model that transforms physical parameters based on human visual system models. We found that the use of the perceptual transforms particularly helps with extrapolation, and without their tempering effects, the machine learning-based models can produce wildly unrealistic quality predictions.

Digital Library: EI

Published Online: January 2018

Miscellany

32 0

MULTIMEDIA ANALYSIS
MACHINE LEARNING
MOBILE
IMAGING
WEB

Pages 560-1 - 560-4, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.10.IMAWM-560

Volume 30

Issue 10

Digital Library: EI

Published Online: January 2018

Application of natural language processing to an online fashion marketplace

79 17

NATURAL LANGUAGE PROCESSING
FASHION MARKET
MACHINE LEARNING

Kendal Norman, Zhi Li, Young-Taek Oh, Gautam Golwala, Sathya Sundaram, Jan Allebach

Pages 444-1 - 444-5, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.10.IMAWM-444

Volume 30

Issue 10

Online fashion marketplaces are experiencing a boost in popularity. People see the appeal of websites where they can sell their products by providing information such as title, price, description, and pictures. With this popular new model for buying and selling fashion products comes a new set of challenges to face. With attention focused on analyzing product titles provided by the user, this paper covers the application of natural language processing techniques and a couple of machine learning algorithms to an online fashion marketplace, with the goal of predicting an item's category or subcategory. The paper begins with an overview of some popular preprocessing techniques in the context of analyzing titles. These preprocessing techniques are vital to the next step, the actual training of the models. This paper covers the development and performance of two models: a model that utilizes a Nave Bayesian learning approach, and a model that utilizes Support Vector Machines as the prediction model. The results from each prediction model are compared and discussed. The results show that the prediction model that utilized the Support Vector Machines was more accurate, and that natural language processing techniques can be effectively applied to an online fashion marketplace to predict an item's category or subcategory.

Digital Library: EI

Published Online: January 2018

Optimizing Image Acquisition Systems for Autonomous Driving

165 36

IMAGE SYSTEMS SIMULATION
AUTONOMOUS VEHICLES
IMAGE SENSORS
MACHINE LEARNING

Henryk Blasinski, Joyce Farrell, Trisha Lian, Zhenyi Liu, Brian Wandell

Pages 161-1 - 161-7, January 2018, © Society for Imaging Science and Technology 2018

DOI

10.2352/ISSN.2470-1173.2018.05.PMII-161

Volume 30

Issue 5

Task requirements for image acquisition systems vary substantially between applications: requirements for consumer photography may be irrelevant - or may even interfere - with requirements for automotive, medical and other applications. The remarkable capabilities of the imaging industry to create lens and sensor designs for specific applications has been demonstrated in the mobile computing market. We might expect that the industry can further innovate if we specify the requirements for other markets. This paper explains an approach to developing image system designs that meet the task requirements for autonomous vehicle applications. It is impractical to build a large number of image acquisition systems and evaluate each of them with real driving data; therefore, we assembled a simulation environment to provide guidance at an early stage. The open-source and freely available software (isetcam, iset3d, and isetauto) uses ray tracing to compute quantitatively how scene radiance propagates through a multi-element lens to form the sensor irradiance. The software then transforms the irradiance into the sensor pixel responses, accounting for a large number of sensor parameters. This enables the user to apply different types of image processing pipelines to generate images that are used to train and test convolutional networks used in autonomous driving. We use the simulation environment to assess performance for different cameras and networks.

Digital Library: EI

Published Online: January 2018

Multi-Level Machine Learning-based Early Termination in VP9 Partition Search

43 2

VIDEO CODEC
MULTI-LEVEL
MACHINE LEARNING
EARLY TERMINATION
ERROR CONTROL

Yang Xian, Yunqing Wang, Yingli Tian, Yaowu Xu, Jim Bankoski

DOI

10.2352/ISSN.2470-1173.2018.2.VIPC-154

Volume 30

Issue 2

In VP9 , a 64×64 superblock can be recursively decomposed all the way to blocks of size 4×4 . The encoder performs the encoding process for each possible partitioning and the optimal one is selected by minimizing the rate and distortion cost. This scheme ensures the encoding quality, but also brings in large computational complexity and substantial CPU resources. In this paper, to speed up the partition search without sacrificing the quality, we propose a multi-level machine learning-based early termination scheme. One weighted Support Vector Machine classifier is trained for each block size. The binary classifiers are used to determine that provided a block, whether it is necessary to continue the search down to smaller blocks, or to perform the early termination and take the current block size as the final one. Moreover, the classifiers are trained with varying error-tolerance for different block sizes, i.e., a stricter error-tolerance is adopted for larger block size compared with the smaller ones to control the encoder performance drop. Extensive experimental results demonstrate that for HD and 4K videos, the proposed framework accomplishes remarkable speed-up (20-25%) with less than 0.03% performance drop measured in the Bjøntegaard delta bit rate (BDBR) compared with current VP9 codebase.

Digital Library: EI

Published Online: January 2018

A Visual Technique to Analyze Flow of Information in a Machine Learning System

90 8

VISUALIZATION
MACHINE LEARNING
SANKEY DIAGRAM

Abon Chaudhuri

DOI

10.2352/ISSN.2470-1173.2018.01.VDA-380

Volume 30

Issue 1

Machine learning (ML) algorithms and machine learning based software systems implicitly or explicitly involve complex flow of information between various entities such as training data, feature space, validation set and results. Understanding the statistical distribution of such information and how they flow from one entity to another influence the operation and correctness of such systems, especially in large-scale applications that perform classification or prediction in real time. In this paper, we propose a visual approach to understand and analyze flow of information during model training and serving phases. We build the visualizations using a technique called Sankey Diagram - conventionally used to understand data flow among sets - to address various use cases of in a machine learning system. We demonstrate how the proposed technique, tweaked and twisted to suit a classification problem, can play a critical role in better understanding of the training data, the features, and the classifier performance. We also discuss how this technique enables diagnostic analysis of model predictions and comparative analysis of predictions from multiple classifiers. The proposed concept is illustrated with the example of categorization of millions of products in the e-commerce domain - a multi-class hierarchical classification problem.

Digital Library: EI

Published Online: January 2018

Page Classification for Print Imaging Pipeline

149 6

PAGE CLASSIFICATION
MACHINE LEARNING
PRINT IMAGING PIPELINE

Shaoyuan Xu, Cheng Lu, Mark Shaw, Peter Bauer, Jan Allebach

DOI

10.2352/ISSN.2470-1173.2017.18.COLOR-046

Volume 29

Issue 18

Digital copiers and printers are widely used nowadays. One of the most important things people care about is copying or printing quality. In order to improve it, we previously came up with an SVM-based classification method to classify images with only text, only pictures or a mixture of both based on the fact that modern copiers and printers are equipped with processing pipelines designed specifically for different kinds of images. However, in some other applications, we need to distinguish more than three classes. In this paper, we develop a more advanced SVM-based classification method using four more new features to classify 5 types of images which are text, picture, mixed, receipt and highlight.

Digital Library: EI

Published Online: January 2017

Evaluation and Prediction of Evoked Emotions Induced by Image Manipulations

51 3

IMAGE EMOTION
IMAGE MANIPULATION
EMOTION TRANSFORMATION
EMOTION EVALUATION
EMOTION PREDICTION
EMOTION DISTRIBUTION
CROWDSOURCING
MACHINE LEARNING

Lin Yuan, Touradj Ebrahimi

DOI

10.2352/ISSN.2470-1173.2017.14.HVEI-150

Volume 29

Issue 14