IS&T | Library

Tomato Leaf Disease Image Recognition based on WGAN and Enhanced Lightweight MobileNet

66 23

deep learning
residual networks
image recognition
crop disease
lightweight model

Haoxuan Li

Pages 1 - 7, March 2025, © Society for Imaging Science and Technology 2025

DOI

10.2352/J.ImagingSci.Technol.2025.69.2.020506

Volume 69

Issue 2

Abstract

Crop diseases have always been a major threat to agricultural production, significantly reducing both yield and quality of agricultural products. Traditional methods for disease recognition suffer from high costs and low efficiency, making them inadequate for modern agricultural requirements. With the continuous development of artificial intelligence technology, utilizing deep learning for crop disease image recognition has become a research hotspot. Convolutional neural networks can automatically extract features for end-to-end learning, resulting in better recognition performance. However, they also face challenges such as high computational costs and difficulties in deployment on mobile devices. In this study, we aim to improve the recognition accuracy of models, reduce computational costs, and scale down for deployment on mobile platforms. Specifically targeting the recognition of tomato leaf diseases, we propose an innovative image recognition method based on a lightweight MCA-MobileNet and WGAN. By incorporating an improved multiscale feature fusion module and coordinate attention mechanism into MobileNetV2, we developed the lightweight MCA-MobileNet model. This model focuses more on disease spot information in tomato leaves while significantly reducing the model’s parameter count. We employ WGAN for data augmentation to address issues such as insufficient and imbalanced original sample data. Experimental results demonstrate that using the augmented dataset effectively improves the model’s recognition accuracy and enhances its robustness. Compared to traditional networks, MCA-MobileNet shows significant improvements in parameters such as accuracy, precision, recall, and F1-score. With a training parameter count of only 2.75M, it exhibits outstanding performance in recognizing tomato leaf diseases and can be widely applied in mobile or embedded devices.

Digital Library: JIST

Published Online: March 2025

Waste Image Classification based on Convolutional Neural Network with Series-parallel Attention Module

40 23

waste classification
deep learning
convolutional block attention module
DenseNet-121

Jianyue Wang

Pages 1 - 13, March 2025, © Society for Imaging Science and Technology 2025

DOI

10.2352/J.ImagingSci.Technol.2025.69.2.020510

Volume 69

Issue 2

Abstract

The rapid evolution of modern society has triggered a surge in the production of diverse waste in daily life. Effective implementation of waste classification through intelligent methods is essential for promoting green and sustainable development. Traditional waste classification techniques suffer from inefficiencies and limited accuracy. To address these challenges, this study proposed a waste image classification model based on DenseNet-121 by adding an attention module. To enhance the efficiency and accuracy of waste classification techniques, publicly available waste datasets, TrashNet and Garbage classification, were utilized for their comprehensive coverage and balanced distribution of waste categories. 80% of the dataset was allocated for training, and the remaining 20% for testing. Within the architecture of DenseNet-121, an enhanced attention module, series-parallel attention module (SPAM), was integrated, building upon convolutional block attention module (CBAM), resulting in a new network model called dense series-parallel attention neural network (DSPA-Net). DSPA-Net was trained and evaluated alongside other CNN models on TrashNet and Garbage classification. DSPA-Net demonstrated superior performance and achieved accuracies of 90.2% and 92.5% on TrashNet and Garbage classification, respectively, surpassing DenseNet-121 and alternative image classification algorithms. These findings underscore the potential for executing efficient and accurate intelligent waste classification.

Digital Library: JIST

Published Online: March 2025

Exploiting Skin Melanin Network for Skin Pigmentation Classification

116 53

skin tone detection
skin type detection
skin melanin index
individual topology angle
skin melanin network
skin structures
Fitzpatrick skin type
deep learning

Shakith Fernando, Karl van Bree, Willem Verkruysse

Pages 256-1 - 256-4, February 2025, This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. 2025

DOI

10.2352/EI.2025.37.9.IQSP-256

Volume 37

Issue 9

Abstract

Accurate and precise classification/quantification of skin pigmentation is critical to address health inequities such as for example racial bias in pulse oximetry. Current skintone classification methods rely on measuring or estimating the color. These methods include a measurement device or subjective matching with skintone color scales. Robust detection of skin type and melanin index is challenging, as these methods require precise calibration. And recently acquired sun exposure may affect the measurements due to tanning or erythema. The proposed system differentiates and quantifies skin type and melanin index by exploiting the variance in skin structures and skin pigmentation network across skin types. Our result with a small study shows skin structure patterns are a robust, color independent method for skin tone classification. A real-time system demo shows the practical viability of the method.

Digital Library: EI

Published Online: February 2025

Identification of Cultural Artifacts using Deep Learning

16 7

cultural artifacts
deep learning
self-attention
artifact dataset

Huajian Liu, Xiaoying Yang, Raphael Antonius Frick, Martin Steinebach

DOI

10.2352/EI.2025.37.8.IMAGE-271

Volume 37

Issue 8

Abstract

This work addresses the challenge of identifying the provenance of illicit cultural artifacts, a task often hindered by the lack of specialized expertise among law enforcement and customs officials. To facilitate immediate assessments, we propose an improved deep learning model based on a pre-trained ResNet model, fine-tuned for archaeological artifact recognition through transfer learning. Our model uniquely integrates multi-level feature extraction, capturing both textural and structural features of artifacts, and incorporates self-attention mechanisms to enhance contextual understanding. In addition, we developed two different artifact datasets: a dataset with mixed types of earthenware and a dataset for coins. Both datasets are categorized according to the age and region of artifacts. Evaluations of the proposed model on these datasets demonstrate improved recognition accuracy thanks to the enhanced feature representation.

Digital Library: EI

Published Online: February 2025

Frontal View Synthesis for Immersive Video Conferencing using Dual-camera Capture and Frame Interpolation

13 3

frontal view synthesis
frame Interpolation
video conferencing
portrait segmentation
facial landmark detection
deep learning

Yezhi Shen, Md Adnan Faisal Hossain, Weichen Xu, Qian Lin, Fengqing Zhu

DOI

10.2352/EI.2025.37.8.IMAGE-273

Volume 37

Issue 8

Abstract

In this paper, we propose a new solution for synthesizing frontal human images in video conferencing, aimed at enhancing immersive communication. Traditional methods such as center staging, gaze correction, and background replacement improve the user experience, but they do not fully address the issue of off-center camera placement. We introduce a system that utilizes two arbitrary cameras positioned on the top bezel of a display monitor to capture left and right images of the video participant. A facial landmark detection algorithm identifies key points on the participant’s face, from which we estimate the head pose. A segmentation model is employed to remove the background, isolating the user. The core component of our method is a video frame interpolation technique that synthesizes a realistic frontal view of the participant by leveraging the two captured angles. This method not only enhances visual alignment between users but also maintains natural facial expressions and gaze direction, resulting in a more engaging and life-like video conferencing experience.

Digital Library: EI

Published Online: February 2025

Automated Monitoring of Stolen Cultural Artifacts on Online Marketplaces

26 10

automated monitoring
stolen cultural artifacts
web crawling
image feature extraction
image matching
deep learning

Huajian Liu, York Yannikos, Julian Heeger, Simon Bugert, Waldemar Berchtold, Martin Steinebach

DOI

10.2352/EI.2025.37.3.MOBMU-314

Volume 37

Issue 3

Abstract

Tracking and identifying stolen cultural artifacts on online marketplaces is a daunting task that has to be accomplished through manual search. In this paper, an automated monitoring tool is developed to track and identify stolen cultural goods on targeted online sales platforms. In case of theft, the original owner can upload descriptive keywords and photos of the stolen objects to start monitoring tasks to track and identify the stolen objects on targeted online marketplaces and get alerted when identical or highly similar objects appear on the monitored sales platforms. The technical challenges posed by automated monitoring are addressed by proposed advanced crawling and image feature extraction and matching solutions. With the support of proposed novel techniques, the developed monitoring tool can efficiently and effectively monitor stolen artifacts on online marketplaces, significantly reducing the manual inspection effort.

Digital Library: EI

Published Online: February 2025

Deep Transfer Learning from Constrained Source to Target Domains in Medical Image Segmentation

104 11

transfer learning
medical image segmentation
autosomal polycystic kidney disease
deep learning

Chetana Krishnan, Emma Schmidt, Ezinwanne Onuoha, Sean Mullen, Ronald Roye, Phillip Chumley, Michal Mrug, Carlos E. Cardenas, Harrison Kim, Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) investigators

DOI

10.2352/J.ImagingSci.Technol.2024.68.6.060505

Volume 68

Issue 6

Abstract

The aim of this work is to transfer the model trained on magnetic resonance images of human autosomal dominant polycystic kidney disease (ADPKD) to rat and mouse ADPKD models. A dataset of 756 MRI images of ADPKD kidneys was employed to train a modified UNet3+ architecture, which incorporated residual layers, switchable normalization, and concatenated skip connections for kidney and cyst segmentation tasks. The trained model was then subjected to transfer learning (TL) using data from two commonly utilized animal PKD models: the Pkdh1pck (PCK) rat and the Pkd1^RC∕RC (RC) mouse. Transfer learning achieved Dice similarity coefficients of 0.93±0.04 and 0.63±0.16 (mean±SD) for a sample combination of PCK+RC kidneys and cysts, respectively, on the test datasets of animal images. We showcased the utilization of TL in situations involving constrained source and target datasets and have achieved good accuracy in the cases of class imbalance.

Digital Library: JIST

Published Online: November 2024

Real-time Facial State Recognition and Fatigue Analysis based on Deep Neural Networks

49 12

driving safety
driver facial feature recognition
fatigue analysis
deep learning
YOLOv5

Chunman Yan, Jiale Li

DOI

10.2352/J.ImagingSci.Technol.2024.68.6.060507

Volume 68

Issue 6

Abstract

In response to challenges such as large parameter count, difficulty in deployment, low accuracy, and slow speed of facial state recognition models in driver fatigue detection, the authors propose a lightweight real-time facial state recognition model called YOLOv5-fatigue based on YOLOv5n. First, a bilateral convolution is proposed, which can fully utilize the feature information in the channel. Then an innovative deep lightweight module is proposed, which reduces the number of network parameters as well as the computational effort by replacing the ordinary convolution in the neck network. Lastly, the normalization-based attention module is added to solve the problem of accuracy decline caused by lightweight models while keeping the number of parameters unchanged. In this paper, we first recognize the facial state by YOLOv5-fatigue and then use the proportion of eyes closed per unit of time and the proportion of mouth closed per unit of time to determine fatigue. In comparison experiments conducted on our self-built VIGP-fatigue dataset with other detection algorithms, our proposed method achieved an increase of 1% in AP50 compared to the baseline model YOLOv5n, reaching 92.6%. The inference time was reduced by 9% to 2.1 ms, and the parameter count decreased by 42.6% to 1.01 M.

Digital Library: JIST

Published Online: November 2024

Solder Ball Defect Detection in BGA-Packaged Chips

263 9

BGA packaged chips
deep learning
defect detection
YOLOv8

Qing Zhao, Honglei Wei, Shiji Zhang, Meng Huang, Xiaoyu Wang, Yan Lv

DOI

10.2352/J.ImagingSci.Technol.2024.68.4.040505

Volume 68

Issue 4

ARCH2023 - Cultural Heritage Archiving Special Issue

Abstract

In response to the current challenges in the detection of solder ball defects in ball grid array (BGA) packaged chips, which include slow detection speed, low efficiency, and poor accuracy, our research has addressed these issues. We have designed an algorithm for detecting solder ball defects in BGA-packaged chips by leveraging the specific characteristics of these defects and harnessing the advantages of deep learning. Building upon the YOLOv8 network model, we have made adaptive improvements to enhance the algorithm. First, we have introduced an adaptive weighted downsampling method to boost detection accuracy and make the model more lightweight. Second, to improve the extraction of image features, we have proposed an efficient multi-scale convolution method. Finally, to enhance convergence speed and regression accuracy, we have replaced the traditional Complete Intersection over Union loss function with Minimum Points Distance Intersection over Union (MPDIoU). Through a series of controlled experiments, our enhanced model has shown significant improvements when compared to the original network. Specifically, we have achieved a 1.7% increase in mean average precision, a 1.5% boost in precision, a 0.9% increase in recall, a reduction of 4.3 M parameters, and a decrease of 0.4 G floating-point operations per second. In comparative experiments, our algorithm has demonstrated superior overall performance when compared to other networks, thereby effectively achieving the goal of solder ball defect detection.

Digital Library: JIST

Published Online: July 2024

149 7

Toward Solving a Puzzle of Fragmented Archeological Textiles

puzzle solving
virtual reconstruction
clustering
deep learning
texture classification
archeological textiles

Davit Gigilashvili, Casper Fabian Gulbrandsen, Ha Thu Nguyen, Margrethe Havgar, Marianne Vedeler, Jon Yngve Hardeberg

DOI

10.2352/J.ImagingSci.Technol.2024.68.4.040407

Volume 68

Issue 4