
Image quality assessment has been a longstanding area of research, with significant efforts dedicated to developing objective metrics that can reliably predict perceived image quality. While numerous image quality metrics have been proposed, ranging from traditional handcrafted approaches to modern machine learning-based models, their evaluation typically relies on statistical comparisons with subjective human ratings where correlation is the primary measure of performance. In this study, we explore an additional dimension in image quality evaluation: the impact of image semantic complexity on metric performance. Specifically, we hypothesize that the number of distinct semantic categories within an image influences the accuracy of image quality metrics. We evaluate 8 state-of-the-art image quality metrics across 2 benchmark datasets, analyzing performance variations with respect to image semantic complexity (category count), based on two vision-language models. Our findings reveal that for some image quality metrics there is an impact of semantic complexity and outliers. This suggests that content-aware evaluation could be crucial for future image quality research.

Most cameras use a single-sensor arrangement with Color Filter Array (CFA). Color interpolation techniques performed during image demosaicing are normally the reason behind visual artifacts generated in a captured image. While the severity of the artifacts depends on the demosaicing methods used, the artifacts themselves are mainly zipper artifacts (block artifacts across the edges) and false-color distortions. In this study and to evaluate the performance of demosaicing methods, a subjective pair-comparison method with 15 observers was performed on six different methods (namely Nearest Neighbours, Bilinear interpolation, Laplacian, Adaptive Laplacian, Smooth hue transition, and Gradient-Based image interpolation) and nine different scenes. The subjective scores and scene images are then collected as a dataset and used to evaluate a set of no-reference image quality metrics. Assessment of the performance of these image quality metrics in terms of correlation with the subjective scores show that many of the evaluated no-reference metrics cannot predict perceived image quality.

Over the years, a high number of different objective image quality metrics have been proposed. While some image quality metrics show a high correlation with subjective scores provided in different datasets, there still exists room for improvement. Different studies have pointed to evaluating the quality of images affected by geometrical distortions as a challenge for current image quality metrics. In this work, we introduce the Colourlab Image Database: Geometric Distortions (CID:GD) with 49 different reference images made specifically to evaluate image quality metrics. CID:GD is one of the first datasets which include three different types of geometrical distortions; seam carving, lens distortion, and image rotation. 35 state-ofthe-art image quality metrics are tested on this dataset, showing that apart from a handful of these objective metrics, most are not able to show a high performance. The dataset is available at <ext-link ext-link-type="url" xlink:href="http://www.colourlab.no/cid">www.colourlab.no/cid</ext-link>.