3D CG Image Quality Assessment in Vision and Language based on Stable Diffusion

Norifumi  Kawabata

doi:10.2352/EI.2025.37.9.IQSP-243

Abstract

GPT-4, which is a multimodal large-scale language model, was released on March 14, 2023. GPT-4 is equipped with Transformer, a machine learning model for natural language processing, which trains a large neural network through unsupervised learning, followed by reinforcement learning from human feedback (RLHF) based on human feedback. Although GPT-4 is one of the research achievements in the field of natural language processing (NLP), it is a technology that can be applied not only to natural language generation but also to image generation. However, specifications for GPT-4 have not been made public, therefore it is difficult to use for research purposes. In this study, we first generated an image database by adjusting parameters using Stable Diffusion, which is a deep learning model that is also used for image generation based on text input and images. And then, we carried out experiments to evaluate the 3D CG image quality from the generated database, and discussed the quality assessment of the image generation model.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2025.37.9.IQSP-243

IQSP-243

Proceedings Paper

3D CG Image Quality Assessment in Vision and Language based on Stable Diffusion

KawabataNorifumi

Computational Imaging Lab, Japan

Abstract

222025

IQSP

Image Quality and System Performance XXII

243-1

243-5

2025

Image Generation AIDiffusion ModelVision and Languageimage-to-imageImage Quality Assessment

articleview.keywords