Different subjective methodologies exist to collect data on human perception of distortions, from rating methodologies with single or double stimuli to ranking with pairwise comparisons. The Maximum Likelihood Difference Scaling (MLDS) method uses triplet/quadruplet-based comparisons as a ranking task. Participants compare intervals inside pairs of stimuli: (a,b) and (c,d). The task is to rank if they perceive greater differences between (a,b) or (c,d). From these comparisons judgments, we can place the assessed stimuli on a perceptual scale (e.g., from low to high quality) with the help of a mathematical solver. However, one limitation is that the perceptual scales retrieved from stimuli of multiple contents are usually different. We previously offered a solution to measure the inter-content scale of multiple contents. In this work, we compare multiple rating and ranking methodologies. We examine obtained subjective quality scores regarding precision by analyzing discriminability in the scores, efficiency by comparing fixed experimental effort costs, and robustness of retrieve estimates to outliers and spammer behaviors. In this work, we put data quality, experimental cost, and resolving power into relation. We show how discriminability in the data impacts the resolving power of popular objective quality metrics. Our findings are that higher-performing metrics require higher-quality data to reveal their full potential.
Andréas Pastor, Lukáš Krasula, Xiaoqing Zhu, Zhi Li, Patrick Le Callet, "Comparison of Subjective Methodologies For Local Perception of Distortion in Videos and Impact on Objective Metrics Resolving Power" in Electronic Imaging, 2024, pp 217-1 - 217-6, https://doi.org/10.2352/EI.2024.36.11.HVEI-217