Back to articles
Proceedings Paper
Volume: 37 | Article ID: IMAGE-266
Image
Optimization of Image Captioning Networks Using Targeted Component Pruning Method
  DOI :  10.2352/EI.2025.37.8.IMAGE-266  Published OnlineFebruary 2025
Abstract
Abstract

Deep learning models have significantly advanced, leading to substantial improvements in image captioning performance over the past decade. However, these improvements have resulted in increased model complexity and higher computational costs. Contemporary captioning models typically consist of three components such as a pre-trained CNN encoder, a transformer encoder, and a decoder. Although research has extensively explored the network pruning for captioning models, it has not specifically addressed the pruning of these three individual components. As a result, existing methods lack the generalizability required for models that deviate from the traditional configuration of image captioning systems. In this study, we introduce a pruning technique designed to optimize each component of the captioning model individually, thus broadening its applicability to models that share similar components, such as encoders and decoder networks, even if their overall architectures differ from the conventional captioning models. Additionally, we implemented a novel modification during the pruning in the decoder through the cross-entropy loss, which significantly improved the performance of the image-captioning model. Furthermore, we trained and validated our approach on the Flicker8k dataset and evaluated its performance using the CIDEr and ROUGE-L metrics.

Subject Areas :
Views 14
Downloads 2
 articleview.views 14
 articleview.downloads 2
  Cite this article 

Jishu Sen Gupta, Yogendra Rao Musunuri, Ih-Man Seo, Oh-Seol Kwon, "Optimization of Image Captioning Networks Using Targeted Component Pruning Methodin Electronic Imaging,  2025,  pp 266-1 - 266-5,  https://doi.org/10.2352/EI.2025.37.8.IMAGE-266

 Copy citation
  Copyright statement 
Copyright © 2025 Society for Imaging Science and Technology 2025
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA