Back to articles
ACDM 2024 Digital Media Special Issue
Volume: 69 | Article ID: 010401
Image
From Vision to Perception: Transforming Art Experience for the Blind with C-ArtQA
  DOI :  10.2352/J.ImagingSci.Technol.2025.69.1.010401  Published OnlineJanuary 2025
Abstract
Abstract

Blind and low vision (BLV) individuals face unique challenges due to a lack of objective explanations and shared artistic vocabulary. This study introduces Cultural ArtQA (C-ArtQA), a benchmark designed to assess whether current multimodal large language models (MLLMs; GPT-4V and Gemini) meet BLV needs by integrating structured visual art descriptions into auditory and tactile domains. The approach categorizes art into Visual, Multimodal Extended, and Imagery Perceptions, distributed across 19 fine-grained categories. The study employs visual question answering with 361 questions generated from a dataset of modern artworks, selected for their accessibility and cultural richness by BLV volunteers and art experts. Results indicate that GPT-4V excels in Visual and Imagery Perceptions while both models underperform in Multimodal Extended Perceptions, highlighting areas for improvement in AI’s support for BLV individuals. This study lays the foundation for developing MLLMs to meet the visual art appreciation needs of the BLV community.

Subject Areas :
Views 57
Downloads 11
 articleview.views 57
 articleview.downloads 11
  Cite this article 

Jia Guo, Yung-Cheng Hsieh, "From Vision to Perception: Transforming Art Experience for the Blind with C-ArtQAin Journal of Imaging Science and Technology,  2025,  pp 1 - 11,  https://doi.org/10.2352/J.ImagingSci.Technol.2025.69.1.010401

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2025
  Article timeline 
  • received May 2024
  • accepted September 2024
  • PublishedJanuary 2025

Preprint submitted to:
  Login or subscribe to view the content