Digital imaging, as an archival practice, is not a "solved problem" for the cultural heritage community. As Google, publishers, and other content providers digitize and deliver resources at scale, there is an increasingly pressing demand from users to digitize the rich resources in library special collections, archival institutions, and the vast array of invaluable content in private collections. This paper introduces a research and learning initiative (Dig4E-Digitization for Everybody) designed to bridge the knowledge gap that presently exists between well-established or emergent international standards derived from imaging science, on the one hand, and local practices for digital reformatting of archival resources. The paper describes the rationale for the education and training initiative and summarizes the intellectual structure and the technical platform of an innovative sequence of self-paced online resources that can be adapted for a variety of audiences.
In scanning microscopy based imaging techniques, there is a need to develop novel data acquisition schemes that can reduce the time for data acquisition and minimize sample exposure to the probing radiation. Sparse sampling schemes are ideally suited for such applications where the images can be reconstructed from a sparse set of measurements. In particular, dynamic sparse sampling based on supervised learning has shown promising results for practical applications. However, a particular drawback of such methods is that it requires training image sets with similar information content which may not always be available. In this paper, we introduce a Supervised Learning Approach for Dynamic Sampling (SLADS) algorithm that uses a deep neural network based training approach. We call this algorithm SLADS-Net. We have performed simulated experiments for dynamic sampling using SLADS-Net in which the training images either have similar information content or completely different information content, when compared to the testing images. We compare the performance across various methods for training such as least-squares, support vector regression and deep neural networks. From these results we observe that deep neural network based training results in superior performance when the training and testing images are not similar. We also discuss the development of a pre-trained SLADS-Net that uses generic images for training. Here, the neural network parameters are pre-trained so that users can directly apply SLADS-Net for imaging experiments.
Recent work in prediction of overall HDR and WCG display quality has shown that machine learning approaches based on physical measurements performs on par with more advanced perceptually transformed measurements. While combining machine learning with the perceptual transforms did improve over using each technique separately, the improvement was minor. However, that work did not explore how well these models performed when applied to display capabilities outside of the training data set. This new work examines what happens when the machinelearning approaches are used to predict quality outside of the training set, both in terms of extrapolation and interpolation. While doing so, we consider two models – one based on physical display characteristics, and a perceptual model that transforms physical parameters based on human visual system models. We found that the use of the perceptual transforms particularly helps with extrapolation, and without their tempering effects, the machine learning-based models can produce wildly unrealistic quality predictions.
Online fashion marketplaces are experiencing a boost in popularity. People see the appeal of websites where they can sell their products by providing information such as title, price, description, and pictures. With this popular new model for buying and selling fashion products comes a new set of challenges to face. With attention focused on analyzing product titles provided by the user, this paper covers the application of natural language processing techniques and a couple of machine learning algorithms to an online fashion marketplace, with the goal of predicting an item's category or subcategory. The paper begins with an overview of some popular preprocessing techniques in the context of analyzing titles. These preprocessing techniques are vital to the next step, the actual training of the models. This paper covers the development and performance of two models: a model that utilizes a Nave Bayesian learning approach, and a model that utilizes Support Vector Machines as the prediction model. The results from each prediction model are compared and discussed. The results show that the prediction model that utilized the Support Vector Machines was more accurate, and that natural language processing techniques can be effectively applied to an online fashion marketplace to predict an item's category or subcategory.
Task requirements for image acquisition systems vary substantially between applications: requirements for consumer photography may be irrelevant - or may even interfere - with requirements for automotive, medical and other applications. The remarkable capabilities of the imaging industry to create lens and sensor designs for specific applications has been demonstrated in the mobile computing market. We might expect that the industry can further innovate if we specify the requirements for other markets. This paper explains an approach to developing image system designs that meet the task requirements for autonomous vehicle applications. It is impractical to build a large number of image acquisition systems and evaluate each of them with real driving data; therefore, we assembled a simulation environment to provide guidance at an early stage. The open-source and freely available software (isetcam, iset3d, and isetauto) uses ray tracing to compute quantitatively how scene radiance propagates through a multi-element lens to form the sensor irradiance. The software then transforms the irradiance into the sensor pixel responses, accounting for a large number of sensor parameters. This enables the user to apply different types of image processing pipelines to generate images that are used to train and test convolutional networks used in autonomous driving. We use the simulation environment to assess performance for different cameras and networks.
In VP9 , a 64×64 superblock can be recursively decomposed all the way to blocks of size 4×4 . The encoder performs the encoding process for each possible partitioning and the optimal one is selected by minimizing the rate and distortion cost. This scheme ensures the encoding quality, but also brings in large computational complexity and substantial CPU resources. In this paper, to speed up the partition search without sacrificing the quality, we propose a multi-level machine learning-based early termination scheme. One weighted Support Vector Machine classifier is trained for each block size. The binary classifiers are used to determine that provided a block, whether it is necessary to continue the search down to smaller blocks, or to perform the early termination and take the current block size as the final one. Moreover, the classifiers are trained with varying error-tolerance for different block sizes, i.e., a stricter error-tolerance is adopted for larger block size compared with the smaller ones to control the encoder performance drop. Extensive experimental results demonstrate that for HD and 4K videos, the proposed framework accomplishes remarkable speed-up (20-25%) with less than 0.03% performance drop measured in the Bjøntegaard delta bit rate (BDBR) compared with current VP9 codebase.
Machine learning (ML) algorithms and machine learning based software systems implicitly or explicitly involve complex flow of information between various entities such as training data, feature space, validation set and results. Understanding the statistical distribution of such information and how they flow from one entity to another influence the operation and correctness of such systems, especially in large-scale applications that perform classification or prediction in real time. In this paper, we propose a visual approach to understand and analyze flow of information during model training and serving phases. We build the visualizations using a technique called Sankey Diagram - conventionally used to understand data flow among sets - to address various use cases of in a machine learning system. We demonstrate how the proposed technique, tweaked and twisted to suit a classification problem, can play a critical role in better understanding of the training data, the features, and the classifier performance. We also discuss how this technique enables diagnostic analysis of model predictions and comparative analysis of predictions from multiple classifiers. The proposed concept is illustrated with the example of categorization of millions of products in the e-commerce domain - a multi-class hierarchical classification problem.
Digital copiers and printers are widely used nowadays. One of the most important things people care about is copying or printing quality. In order to improve it, we previously came up with an SVM-based classification method to classify images with only text, only pictures or a mixture of both based on the fact that modern copiers and printers are equipped with processing pipelines designed specifically for different kinds of images. However, in some other applications, we need to distinguish more than three classes. In this paper, we develop a more advanced SVM-based classification method using four more new features to classify 5 types of images which are text, picture, mixed, receipt and highlight.
Various image editing tools make our pictures more attractive, and at the same time, evoke different emotional responses. With powerful and easy-to-use imaging applications, capturing, editing and then sharing pictures have become daily life for many. This paper investigates the influence of several image manipulations on evoked emotions for different types of images. To do so, various types of images clustered in different categories, were collected from Instagram and subjective evaluations were conducted via crowdsourcing to gather the emotional responses on different manipulations as perceived by subjects. Evaluation results show that certain image manipulations can induce different evoked emotions on transformed pictures when compared to the original ones. However, such changes in image emotions due to manipulation are highly content dependent. Then, we conducted a machine learning based experiment, in attempt to predict the emotions of a manipulated image given its original version and the desired manipulation method. Experimental results present a promising performance of such a prediction model, which could pave the road to automatic selection or recommendation of image editing tools that can efficiently transform or emphasize desired emotions in pictures.