IS&T | Library

10.2352/J.ImagingSci.Technol.2025.69.5.050101

Abstract

One central challenge in modeling material appearance perception is the creation of an explainable and navigable representation space. In this study, we address this by training a StyleGAN2-ADA deep generative model on a large-scale, physically based rendered dataset containing translucent and glossy objects with varying intrinsic optical parameters. The resulting latent vectors are analyzed through dimensionality reduction, and their perceptual validity is assessed via psychophysical experiments. Furthermore, we evaluate the generalization capabilities of StyleGAN2-ADA on unseen materials. We also explore inverse mapping techniques from latent vectors reduced by principal component analysis back to original optical parameters, highlighting both the potential and the limitations of generative models for explicit, parameter-based image synthesis. A comprehensive analysis provides significant insights into the latent structure of gloss and translucency perception and advances the practical application of generative models for controlled material appearance generation.

Digital Library: JPI

Published Online: October 2025

Editorial

0 0

From the Editor

Chunghui Kuo

Page 1, September 2025, © Society for Imaging Science and Technology 2025

DOI

Volume 69

Issue 5

Digital Library: JIST

Published Online: September 2025

Progressive Fusion Image Transparency-Guided Enhancement Algorithm based on Generative Adversarial Learning

0 0

generative adversarial learning
progressive fusion of images
transparency mask
image enhancement

Le Wang, Xiaona Ding, Yinglin Zhu

Pages 1 - 10, September 2025, © Society for Imaging Science and Technology 2025

DOI

10.2352/J.ImagingSci.Technol.2025.69.5.050501

Volume 69

Issue 5

Abstract

The progressive fusion algorithm enhances image boundary smoothness, preserves details, and improves visual harmony. However, issues with multi-scale fusion and improper color space conversion can lead to blurred details and color distortion, which do not meet modern image processing standards for high-quality output. Therefore, a progressive fusion image transparency-guided enhancement algorithm based on generative adversarial learning is proposed. The method combines wavelet transform with gradient field fusion to enhance image details, preserve spectral features, and generate high-resolution true-color fused images. It extracts the image mean, standard deviation, and smoothness features, and uses these along with the original image input to generate an adversarial network. The optimization design introduces global context, transparency mask prediction, and a dual-discriminator structure to enhance the transparency of progressively fused images. The experimental results showed that using the designed method, the information entropy was 7.638, the blind image quality index was 24.331, the natural image quality evaluator value was 3.611, and the processing time was 0.036 s. The overall evaluation indices were excellent, effectively restoring image detail information and spatial color while avoiding artifacts. The processed images exhibited high quality with complete detail preservation.

Digital Library: JIST

Published Online: September 2025

Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realization

0 0

diffusion model
image generation
prompt learning
document processing
seal generation

Mingfu Yan, Jiancheng Huang, Shifeng Chen

Pages 1 - 11, September 2025, © Society for Imaging Science and Technology 2025

DOI

10.2352/J.ImagingSci.Technol.2025.69.5.050502

Volume 69

Issue 5

Abstract

Seal-related tasks in document processing—such as seal segmentation, authenticity verification, seal removal, and text recognition under seals—hold substantial commercial importance. However, progress in these areas has been hindered by the scarcity of labeled document seal datasets, which are essential for supervised learning. To address this limitation, we propose Seal2Real, a novel generative framework designed to synthesize large-scale labeled document seal data. As part of this work, we also present Seal-DB, a comprehensive dataset containing 20,000 labeled images to support seal-related research. Seal2Real introduces a prompt prior learning architecture built upon a pretrained Stable Diffusion model, effectively transferring its generative capability to the unsupervised domain of seal image synthesis. By producing highly realistic synthetic seal images, Seal2Real significantly enhances the performance of downstream seal-related tasks on real-world data. Experimental evaluations on the Seal-DB dataset demonstrate the effectiveness and practical value of the proposed framework.

Digital Library: JIST

Published Online: September 2025

Accurate Tomato Leaf Disease Identification Method based on Improved Swin Transformer

0 0

tomato leaf disease identification
Swin Transformer
Local Feature Perception
Spatial Texture Attention

Yue Cao

Pages 1 - 11, September 2025, © Society for Imaging Science and Technology 2025

DOI

10.2352/J.ImagingSci.Technol.2025.69.5.050503

Volume 69

Issue 5

Abstract

The tomato leaf is a significant organ that reflects the health and growth of tomato plants. Early detection of leaf diseases is crucial to both crop yield and the income of farmers. However, the global distribution of diseases across tomato leaves coupled with fine-grained differences among various diseases poses significant challenges for accurate disease detection. To tackle these obstacles, we propose an accurate tomato leaf disease identification method based on an improved Swin Transformer. The proposed method consists of three parts: the Swin Transformer backbone, a Local Feature Perception (LFP) module, and a Spatial Texture Attention (STA) module. The backbone can model long-range dependencies of leaf diseases for representative features while the LFP module adopts a multi-scale aggregation strategy to enhance the capability of the Swin Transformer in local feature extraction. Moreover, the STA module integrates hierarchical features from different stages of the Swin Transformer to capture fine-grained features for the classification head and boost overall performance. Extensive experiments are conducted on the public LBFtomato dataset, and the results demonstrate the superior performance of our proposed method. Our proposed model achieves the scores of 99.28% in Accuracy, 99.07% in Precision, 99.36% in Recall, and 99.24% in F₁-score metrics.

Digital Library: JIST

Published Online: September 2025

Unveiling the Role of Color in Skin Segmentation: Analysis of Augmentation Techniques and Color Spaces

2 0

Mekides Assefa Abebe, Soroush Shahbaznejad, Alireza Rabbanifar, Elena Fedorovskaya

Pages 1 - 13, September 2025, © Society for Imaging Science and Technology 2025

DOI

10.2352/J.ImagingSci.Technol.2025.69.5.050504

Volume 69

Issue 5

Abstract

Recent advancements in artificial intelligence have significantly impacted many color imaging applications, including skin segmentation and enhancement. Although state-of-the-art methods emphasize geometric image augmentations to improve model performance, the role of color-based augmentations and color spaces in enhancing skin segmentation accuracy remains underexplored. This study addresses this gap by systematically evaluating the impact of various color-based image augmentations and color spaces on skin segmentation models based on convolutional neural networks (CNNs). We investigate the effects of color transformations—including brightness, contrast, and saturation adjustments—in three color spaces: sRGB, YCbCr, and CIELab. To represent CNN models, an existing semantic segmentation model is trained using these color augmentations on a custom dataset of 900 images with annotated skin masks, covering diverse skin tones and lighting conditions. Our findings reveal that current training practices, which primarily rely on single-color augmentation in the sRGB space and focus mainly on geometric augmentations, limit model generalization in color-related applications like skin segmentation. Models trained with a greater variety of color augmentations show improved skin segmentation, particularly under over- and underexposure conditions. Additionally, models trained in YCbCr outperform those trained in sRGB color space when combined with color augmentation while CIELab leads to comparable performance to sRGB. We also observe significant performance discrepancies across skin tones, highlighting challenges in achieving consistent segmentation under varying lighting. This study highlights gaps in existing image augmentation approaches and provides insights into the role of various color augmentations and color spaces in improving the accuracy and inclusivity of skin segmentation models.

Digital Library: JIST

Published Online: September 2025

Slow–Fast Convolutional Attention Network for Human Action Recognition

0 0

action recognition
convolutional attention mechanism
SlowFast network

Yuzhu Lin, Xutao Sun, Yonggong Ren

DOI

10.2352/J.ImagingSci.Technol.2025.69.5.050505

Volume 69

Issue 5

Work Presented at CIC33: Color and Imaging Conference 2025

Abstract

In modern life, with the explosive growth of video, images, and other data, the use of computers to automatically and efficiently classify and analyze human actions has become increasingly important. Action recognition, a problem of perceiving and understanding the behavioral state of objects in a dynamic scene, is a fundamental yet key task in the computer field. However, analyzing a video with multiple objects or a video with irregular shooting angles poses a significant challenge for existing action recognition algorithms. To address these problems, the authors propose a novel deep-learning-based method called SlowFast-Convolutional Block Attention Module (SlowFast-CBAM). Specifically, the training dataset is preprocessed using the YOLOX network, where individual frames of action videos are separately placed in slow and fast pathways. Then, CBAM is incorporated into both the slow and fast pathways to highlight features and dynamics in the surrounding environment. Subsequently, the authors establish a relationship between the convolutional attention mechanism and the SlowFast network, allowing them to focus on distinguishing features of objects and behaviors appearing before and after different actions, thereby enabling action detection and performer recognition. Experimental results demonstrate that this approach better emphasizes the features of action performers, leading to more accurate action labeling and improved action recognition accuracy.

Digital Library: JIST

Published Online: September 2025

0 0

The Lower the Bumps, the Higher the Translucency: How Editing Perceived Bumpiness Affects Material Appearance

Seyedeh Kimia Arfaie Oghani, Davit Gigilashvili, Mobina Mobini, Midori Tanaka, Takahiko Horiuchi

DOI

10.2352/J.ImagingSci.Technol.2025.69.5.050401

Volume 69

Issue 5

Work Presented at CIC33: Color and Imaging Conference 2025

Abstract

A recent work proposed a methodology to effectively enhance or suppress perceived bumpiness in digital images. We hypothesized that this manipulation may affect perceived translucency due to similarity in affected image cues. In this work, we test this hypothesis experimentally. We conducted psychophysical experiments and found a correlation between perceived bumpiness and perceived translucency in processed images. This not only may have implications when digitally editing the bumpiness of a given material but also can be taken advantage of as a translucency editing tool unless the method produces artifacts. To check this, we evaluated the naturalness and quality of the processed images using subjective user study and objective image quality metrics, respectively.

Digital Library: JIST

Published Online: September 2025

0 0

Automatic Image Colorization with Semantic Segmentation and Multipath Deep Networks

colorization
multipath networks
convolutional neural network
semantic segmentation

Jie-Sen Wang, Hung-Chung Li, Pei-Li Sun

DOI

10.2352/J.ImagingSci.Technol.2025.69.5.050402

Volume 69

Issue 5

Remote Research in Cognition and Perception

Abstract

A fully automated colorization model that integrates image segmentation features to enhance both the accuracy and diversity of colorization is proposed. In the model, a multipath architecture is employed, with each path designed to address a specific objective in processing grayscale input images. The context path utilizes a pretrained ResNet50 model to identify object classes while the spatial path determines the locations of these objects. ResNet50 is a 50-layer deep convolutional neural network (CNN) that uses skip connections to address the challenges of training deep models. It is widely applied in image classification and feature extraction. The outputs from both paths are subsequently fused and fed into the colorization network to ensure precise representation of image structures and to prevent color spillover across object boundaries. The colorization network is designed to handle high-resolution inputs, enabling accurate colorization of small objects and enhancing overall color diversity. The proposed model demonstrates robust performance even when training with small datasets. Comparative evaluations with CNN-based and diffusion-based classification approaches show that the proposed model significantly improves colorization quality.

Digital Library: JIST

Published Online: September 2025

Editorial

39 7

Special Issue : Special Issue on Remote Research in Cognition and Perception

remote research
internet-based research
internet-based experiments
iScience
cognition
perception

Ulf-Dietrich Reips, Kimberly A. Jameson

Pages 1 - 4, August 2025, This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. 2025

DOI

10.2352/J.Percept.Imaging.2025.8.000101

Volume 8