Subjective quality assessment is considered a reliable method for quality assessment of distorted stimuli for several multimedia applications. The experimental methods can be broadly categorized into those that rate and rank stimuli. Although ranking directly provides an order of stimuli rather than a continuous measure of quality, the experimental data can be converted using scaling methods into an interval scale, similar to that provided by rating methods. In this paper, we compare the results collected in a rating (mean opinion scores) experiment to the scaled results of a pairwise comparison experiment, the most common ranking method. We find a strong linear relationship between results of both methods, which, however, differs between content. To improve the relationship and unify the scale, we extend the experiment to include cross-content comparisons. We find that the cross-content comparisons reduce the confidence intervals for pairwise comparison results, but also improve the relationship with mean opinion scores.
In this paper, we compare the Double-Stimulus Impairment Scale (DSIS) and a Modified Absolute Category Rating (M-ACR) subjective quality evaluation method for HEVC/H.265-encoded omnidirectional videos. These two methods differ in the type of rating scale and presentation of stimuli. Results of our test provide insight into the similarities and differences between these two subjective test methods. Also, we investigate whether the results obtained with these subjective test methods are content-dependent. We evaluated subjective quality on an Oculus Rift for two different resolutions (4K and FHD) and at five different bit-rates. Experimental results show that for 4K resolution, for the lower bit-rates at 1 and 2 MBit/s, M-ACR provides slightly higher MOS compared to DSIS. For 4, 8, 15 MBit/s, DSIS provides slightly higher MOS. While the correlation coefficient between these two methods is very high, M-ACR offers a higher statistical reliability than DSIS. We also compared simulator sickness scores and viewing behavior. Experimental results show that subjects are more prone to simulator sickness while evaluating 360° videos with the DSIS method.
Psychovisual rate-distortion optimization (Psy-RD) has been used in the industrial video coding practice as a tool to improve perceptual video quality. It has earned significant popularity through the wide spread of the open source x264 video encoders, where the Psy-RD option is employed by default. Nevertheless, little work has been dedicated to validate the impact of Psy-RD optimization on perceptual quality, so as to provide meaningful guidance on the practical usage and future development of the idea. In this work, we build a database that contains Psy-RD encoded video sequences at different strength and bitrates. A subjective user study is then conducted to evaluate and compare the quality of the Psy-RD encoded videos. We observe that there is considerable agreement between subjects' opinions on the test video sequences. Unfortunately, the impact of Psy-RD optimization on video quality does not appear to be encouraging. Somewhat surprisingly, the perceptual quality gain of Psy-RD ON versus Psy-RD OFF cases is negative on average. Our results suggest that Psy-RD optimization should be used with caution. Further investigations show that most state-of-the-art full-reference objective quality models correlate well with the subjective experiment results overall. But in terms of the paired comparison between Psy-RD ON and OFF cases, the false alarm rates are moderately high.