Psychovisual Experimentation Using mLLMs as Observers

Robin  Jenkin; Preeti S Pillai; Aruna S Nayak; Abhishek A Joshi; Vasudhaika  S; Sinchana  C; Abhishek  Patil

doi:10.2352/EI.2026.38.12.GENAI-178

Abstract

With the rapid progress of multi-modal large language models (mLLMs), there is growing interest in whether such models can act as judges of image quality. A fundamental question exists, however, as to the ability of such models to distinguish between various levels of image quality attributes, such as sharpness and noise. This work represents one of the first systematic investigations of mLLMs as evaluators in classical paired comparison image quality assessment (IQA) experiments. Prior work in mLLM-based vision has focused on captioning or recognition tasks, whereas our study explicitly frames Gemini 2.0 Flash as a proxy subject in psychovisual testing to establish just noticeable differences (JNDs) for sharpness and noise using the Kodak image quality ruler dataset as stimulus. For both sharpness and noise, the magnitudes of JNDs were found to be proportional to the relative quality of the stimulus. Surprisingly, judgments of individual pairs of images were found to be probabilistic rather than absolute, with more uncertainty observed for sharpness discrimination than noise. Prompt engineering is detailed as is the statistical analysis of results. Understanding the extent to which mLLMs can act as reliable perceptual proxies offers transformative implications for automated IQA, dataset labeling, and adaptive imaging pipelines.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2026.38.12.GENAI-178

GENAI-178

Proceedings Paper

Psychovisual Experimentation Using mLLMs as Observers

JenkinRobin

NVIDIA, US

PillaiPreeti S