The London Imaging Meeting is a yearly topics-based conference organized by the Society of Imaging Science and technology (IS&T), in collaboration with the Institute of Physics (IOP) and the Royal Photographic Society. This year's topic was "Imaging for Deep Learning". At the heart of our conference were five focal talks given by worldrenowned experts in the field (who then also organised the related sessions). Focal speakers were Dr. Seyed Ali Amirshahi, NTNU, Norway (Image Quality); Prof. Jonas Unger, Linköping University, Sweden (Datasets for Deep Learning); Prof. Simone Bianco, Università degli Studi di Milano-Bicocca, Italy (Color Constancy); Dr. Valentina Donzella, University of Warwick, UK (Imaging Performance); and Dr. Ray Ptucha, Apple Inc. US (Characterization and Optimization). We also had two superb keynote speakers. Thanks to Dr. Robin Jenkin, Nvidia, for his talk on "Camera Metrics for Autonomous Vision" and to Dr. Joyce Farrell, Stanford University, for her talk on "Soft Prototyping Camera Designs for Autonomous Driving". As a new innovation this year—and to support the remit of LIM to reach out to students in the field—we included an invited tutorial research lecture. Given by Prof. Stephen Westland, University of Leeds, the presentation titled "Using Imaging Data for Efficient Colour Design" looked at deep learning techniques in the field of design and demonstrated that simple applications of deep learning can deliver excellent results. There were many strong contenders for the LIM Best Paper Award. Noteworthy, honourable mentions include "Portrait Quality Assessment using Multi-scale CNN", N. Chahine and S. Belkarfa, DXOMARK' "HDR4CV: High dynamic range dataset with adversarial illumination for testing computer vision methods", P. Hanjil et al., University of Cambridge; "Natural Scene Derived Camera Edge Spatial Frequency Response for Autonomous Vision Systems", O. van Zwanenberg et al., University of Westminster; and "Towards a Generic Neural Network Architecture for Approximating Tone Mapping Algorithms", J. McVey and G. Finlayson, University of East Anglia. But, by a unanimous vote, this year's Best Paper was awarded to "Impact of the Windshield's Optical Aberrations on Visual Range Camera-based Classification Tasks Performed by CNNs", C. Krebs, P. Müller, and A. Braun, (Hochschule Düsseldorf) (University of Applied Sciences Düsseldorf), Germany. We thank everyone who helped make LIM a success including the IS&T office, and the LIM presenters, reviewers, focal speakers, and keynotes, as well as the audience, who participated in making the event engaging and vibrant. This year, the conference was run by the IOP and we are extremely grateful for their help in hosting the event. A final special thanks go to the Engineering and Physical Sciences Research Council (EPSRC) who provided funding through the grant EP/S028730/1. Finally, we are pleased to announce that next year's LIM conference will be in the area of "Displays"; the conference chair is Dr. Rafal Mantiuk, University of Cambridge. —Prof. Graham Finlayson, LIM series chair, and Prof. Sophie Triantaphillidou, LIM2021 conference chair
In this paper, we propose a novel and standardized approach to the problem of camera-quality assessment on portrait scenes. Our goal is to evaluate the capacity of smartphone front cameras to preserve texture details on faces. We introduce a new portrait setup and an automated texture measurement. The setup includes two custom-built lifelike mannequin heads, shot in a controlled lab environment. The automated texture measurement includes a Region-of-interest (ROI) detection and a deep neural network. To this aim, we create a realistic mannequins database, which contains images from different cameras, shot in several lighting conditions. The ground-truth is based on a novel pairwise comparison technology where the scores are generated in terms of Just-Noticeable-differences (JND). In terms of methodology, we propose a Multi-Scale CNN architecture with random crop augmentation, to overcome overfitting and to get a low-level feature extraction. We validate our approach by comparing its performance with several baselines inspired by the Image Quality Assessment (IQA) literature.
Automatic assessment of image aesthetics is a challenging task for the computer vision community that has a wide range of applications. The most promising state-of-the-art approaches are based on deep learning methods that jointly predict aesthetics-related attributes and aesthetics score. In this article, we propose a method that learns the aesthetics score on the basis of the prediction of aesthetics-related attributes. To this end, we extract a multi-level spatially pooled (MLSP) features set from a pretrained ImageNet network and then these features are used to train a Multi Layer Perceptron (MLP) to predict image aesthetics-related attributes. A Support Vector Regression machine (SVR) is finally used to estimate the image aesthetics score starting from the aesthetics-related attributes. Experimental results on the ”Aesthetics with Attributes Database” (AADB) demonstrate the effectiveness of our approach that outperforms the state of the art of about 5.5% in terms of Spearman’s Rankorder Correlation Coefficient (SROCC).
This paper presents an evaluation of how data augmentation and inter-class transformations can be used to synthesize training data in low-data scenarios for single-image weather classification. In such scenarios, augmentations is a critical component, but there is a limit to how much improvements can be gained using classical augmentation strategies. Generative adversarial networks (GAN) have been demonstrated to generate impressive results, and have also been successful as a tool for data augmentation, but mostly for images of limited diversity, such as in medical applications. We investigate the possibilities in using generative augmentations for balancing a small weather classification dataset, where one class has a reduced number of images. We compare intra-class augmentations by means of classical transformations as well as noise-to-image GANs, to interclass augmentations where images from another class are transformed to the underrepresented class. The results show that it is possible to take advantage of GANs for inter-class augmentations to balance a small dataset for weather classification. This opens up for future work on GAN-based augmentations in scenarios where data is both diverse and scarce.
360-degree Image quality assessment (IQA) is facing the major challenge of lack of ground-truth databases. This problem is accentuated for deep learning based approaches where the performances are as good as the available data. In this context, only two databases are used to train and validate deep learning-based IQA models. To compensate this lack, a dataaugmentation technique is investigated in this paper. We use visual scan-path to increase the learning examples from existing training data. Multiple scan-paths are predicted to account for the diversity of human observers. These scan-paths are then used to select viewports from the spherical representation. The results of the data-augmentation training scheme showed an improvement over not using it. We also try to answer the question of using the MOS obtained for the 360-degree image as the quality anchor for the whole set of extracted viewports in comparison to 2D blind quality metrics. The comparison showed the superiority of using the MOS when adopting a patch-based learning.
We present a system to perform joint registration and fusion for RGB and Infrared (IR) video pairs. While RGB is related to human perception, IR is associated with heat. However, IR images often lack contour and texture information. The goal with the fusion of the visible and IR images is to obtain more information from them. This requires two completely matched images. However, classical methods assuming ideal imaging conditions fail to achieve satisfactory performance in actual cases. From the data-dependent modeling point of view, labeling the dataset is costly and impractical.In this context, we present a framework that tackles two challenging tasks. First, a video registration procedure that aims to align IR and RGB videos. Second, a fusion method brings all the essential information from the two video modalities to a single video. We evaluate our approach on a challenging dataset of RGB and IR video pairs collected for firefighters to handle their tasks effectively in challenging visibility conditions such as heavy smoke after a fire, see our project page.
Exposure problems, due to standard camera sensor limitations, often lead to image quality degradations such as loss of details and change in color appearance. The quality degradations further hiders the performances of imaging and computer vision applications. Therefore, the reconstruction and enhancement of uderand over-exposed images is essential for various applications. Accordingly, an increasing number of conventional and deep learning reconstruction approaches have been introduced in recent years. Most conventional methods follow color imaging pipeline, which strongly emphasize on the reconstructed color and content accuracy. The deep learning (DL) approaches have conversely shown stronger capability on recovering lost details. However, the design of most DL architectures and objective functions don’t take color fidelity into consideration and, hence, the analysis of existing DL methods with respect to color and content fidelity will be pertinent. Accordingly, this work presents performance evaluation and results of recent DL based overexposure reconstruction solutions. For the evaluation, various datasets from related research domains were merged and two generative adversarial networks (GAN) based models were additionally adopted for tone mapping application scenario. Overall results show various limitations, mainly for severely over-exposed contents, and a promising potential for DL approaches, GAN, to reconstruct details and appearance.
Blind assessment of video quality is a widely covered topic in computer vision. In this work, we perform an analysis of how much the effectiveness of some of the current No-Reference VQA (NR-VQA) methods varies with respect to specific types of scenes. To this end, we automatically annotated the videos from two video quality datasets with user-generated videos whose content is unknown and then estimated the correlation for the different categories of scenes. The results of the analysis highlight that the prediction errors are not equally distributed among the different categories of scenes and indirectly suggest what next generation NR-VQA methods should take into account and model.
We demonstrate that a deep neural network can achieve near-perfect colour correction for the RGB signals from the sensors in a camera under a wide range of daylight illumination spectra. The network employs a fourth input signal representing the correlated colour temperature of the illumination. The network was trained entirely on synthetic spectra and applied to a set of RGB images derived from a hyperspectral image dataset under a range of daylight illumination with CCT from 2500K to 12500K. It produced an invariant output image as XYZ referenced to D65, with a mean colour error of approximately 1.0 ΔE*ab.