Consumer cameras are indispensable tools for communication, content creation, and remote work, but image and video quality can be affected by various factors such as lighting, hardware, scene content, face detection, and automatic image processing algorithms. This paper investigates how web and phone camera systems perform in face-present scenes containing diverse skin tones, and how performance can be objectively measured using standard procedures and analyses. We closely examine image quality factors (IQFs) commonly impacted by scene content, emphasizing automatic white balance (AWB), automatic exposure (AE), and color reproduction according to Valued Camera Experience (VCX) standard procedures. Video tests are conducted for scenes containing standard compliant mannequin heads, and across a novel set of AI-generated faces with 10 additional skin tones based on the Monk Skin Tone Scale. Findings indicate that color shifts, exposure errors, and reduced overall image fidelity are unfortunately common for scenes containing darker skin tones, revealing a major short-coming in modern-day automatic image processing algorithms, highlighting the need for testing across a more diverse range of skin tones when developing automatic processing pipelines and the standards that test them.
When enjoying video streaming services, users expect high video quality in various situations, including mobile phone connections with low bandwidths. Furthermore, the user's interest in consuming new large-size data content, such as high resolution/frame rate material or 360 degree videos, is gaining as well. To deal with such challenges, modern encoders adaptively reduce the size of the transmitted data. This in turn requires automated video quality monitoring solutions to ensure a sufficient quality of the material delivered. We present a no-reference video quality model; a model that does not require the original reference material, which is convenient for application in the field. Our approach uses a pretrained classification DNN in combination with hierarchical sub-image creation, some state-of-the-art features and a random forest model. Furthermore, the model can process UHD content and is trained on a large ground-truth data set, which is generated using a state-of-the-art full-reference model. The proposed model achieved a high quality prediction accuracy, comparable to a number of full-reference metrics. Thus our model is a proof-of-concept for a successful no-reference video quality estimation.
Various approaches for the assessment of quality of experience for video conferencing exist that try to quantify the level of satisfaction of the user. In order for videoconferencing to really succeed as a substitute of face-to-face meetings, it is constructive to explore the associated quality of communication. One of the most significant factors to ascertain in this regard would be the assessment of the level of user satisfaction for a videoconference instead of a face-to-face meeting. Various recommendations of ITU-T related to this field (such as ITU-T Recommendation P.920) deal with subjective experiments that involve performing interactive tasks in order to quantify the impact of terminal and communication link performance. In addition to looking at different quality paradigms, in this paper, we review a number of subjective studies on videoconferences and investigate the possibilities of discovering more of the user feedback in order to assess the perceived quality of communication experienced in a videoconference. Based on the review, we present a set of future work items which can be useful in finding a comprehensive definition for quality of communication.