Consumer cameras are indispensable tools for communication, content creation, and remote work, but image and video quality can be affected by various factors such as lighting, hardware, scene content, face detection, and automatic image processing algorithms. This paper investigates how web and phone camera systems perform in face-present scenes containing diverse skin tones, and how performance can be objectively measured using standard procedures and analyses. We closely examine image quality factors (IQFs) commonly impacted by scene content, emphasizing automatic white balance (AWB), automatic exposure (AE), and color reproduction according to Valued Camera Experience (VCX) standard procedures. Video tests are conducted for scenes containing standard compliant mannequin heads, and across a novel set of AI-generated faces with 10 additional skin tones based on the Monk Skin Tone Scale. Findings indicate that color shifts, exposure errors, and reduced overall image fidelity are unfortunately common for scenes containing darker skin tones, revealing a major short-coming in modern-day automatic image processing algorithms, highlighting the need for testing across a more diverse range of skin tones when developing automatic processing pipelines and the standards that test them.
Imaging system performance measures and Image Quality Metrics (IQM) are reviewed from a systems engineering perspective, focusing on spatial quality of still image capture systems. We classify IQMs broadly as: Computational IQMs (CP-IQM), Multivariate Formalism IQMs (MF-IQM), Image Fidelity Metrics (IF-IQM), and Signal Transfer Visual IQMs (STV-IQM). Comparison of each genre finds STV-IQMs well suited for capture system quality evaluation: they incorporate performance measures relevant to optical systems design, such as Modulation Transfer Function (MTF) and Noise-Power Spectrum (NPS); their bottom-up, modular approach enables system components to be optimized separately. We suggest that correlation between STV-IQMs and observer quality scores is limited by three factors: current MTF and NPS measures do not characterize scene-dependent performance introduced by imaging system non-linearities; contrast sensitivity models employed do not account for contextual masking effects; cognitive factors are not considered. We hypothesize that implementation of scene and process-dependent MTF (SPD-MTF) and NPS (SPD-NPS) measures should mitigate errors originating from scene dependent system performance. Further, we propose implementation of contextual contrast detection and discrimination models to better represent low-level visual performance in image quality analysis. Finally, we discuss image quality optimization functions that may potentially close the gap between contrast detection/discrimination and quality.
With the advent of computational photography, most cellphones include High Dynamic Range (HDR) modes or "apps" that capture and render high contrast scenes in-camera using techniques such as multiple exposures and subsequent "addition" of those exposures to render a properly exposed image. The results from different cameras vary. Testing the image quality of different cameras involves field-testing under dynamic lighting conditions that may involve moving objects. Such testing often becomes a cumbersome and time-consuming task. It would be more efficient to conduct such testing in a controlled, laboratory environment. This study investigates the feasibility of such testing. Natural exterior scenes, at day and night, some of which include "motion", were captured with a range of cellphone cameras using their native HDR modes. The luminance ratios of these scenes were accurately measured using various spectro-radiometers and luminance meters. Artificial scenes, which include characteristics of the natural exterior scenes and have similar luminance ratios, were created in a laboratory environment. These simulated scenes were captured using the same modes as the natural exterior scenes. A subjective image quality evaluation was conducted using some 20 observers to establish an observer preference scale separately for each scene. For each natural exterior scene, the correlation coefficients between its preference scale and the preference scale obtained for each laboratory scene were calculated, and the laboratory scene with the highest correlation was identified. It was determined that while it was difficult to accurately quantify the actual dynamic range of a natural exterior scene, especially at night, we could still simulate the luminance ratios of a wide range of natural exterior HDR scenes, from 266:1 to 15120:1, within a laboratory environment. Preliminary results of the subjective study indicated that reasonably good correlation (0.8 or higher on average) was obtained between the natural exterior and laboratory simulated scenes. However, such correlations were determined to be specific to the type of scene studied. The scope of this study needs to be narrowed. Another consideration, how moving objects in the scene would affect the results, needs further investigation.