
Given a picture classified as a Persian cat by an AI model, users may ask questions such as, “What are the contributions of the eyes and ears to the classification result?” or “Which features contribute the most?” While existing post-hoc XAI methods effectively explain model predictions at the pixel or patch level, they are limited in directly quantifying the contributions of human-interpretable semantic features. In this paper, we propose a visual analytics approach for feature-level interpretation of image classification results. Our contributions are twofold. First, we introduce a semantic contribution quantification method that builds upon existing pixel-level attribution techniques (e.g., Layer-wise Relevance Propagation, Grad-CAM). Specifically, we aggregate and normalize pixel-level relevance scores over predefined semantic regions (such as eyes, ears, and body) to compute comparable contribution scores for each semantic feature within an image. Second, we present an interactive visual interface that leverages these quantified semantic feature contributions to support exploration, comparison, and analysis of AI outputs across image collections. Through illustrative scenarios and expert feedback, we demonstrate that our approach provides an intuitive, scalable, and semantically meaningful means to interpret image classification explanations.

For decades, image quality analysis pipeline has been using filters that are derived from human vision system. Although this paradigm is able to capture the basic aspects of human vision, it falls short of characterizing the complex human perception of different visual appearance and image quality. In this work, we propose a new framework that leverages the image recognition capabilities of convolution neural networks to distinguish the visual differences between uniform halftone target samples that are printed on different media using the same printing technology. First, for each scanned target sample, a pre-trained Residual Neural Network is used to generate 2,048-dimension vision feature vector. Then, Principal Component Analysis is used to reduce the dimension to 48 components, which is then used to train a Support Vector Machine to classify the target images. Our model has been tested on various classification and regression tasks and shows very good performance. Further analysis shows that our neural-network-based image quality model learns to makes decisions based on the frequencies of color variations within the target image, and it is capable of characterizing the visual differences under different printer settings.