Back to articles
CAPT 2024 Latest Innovations on Printing and Packaging Technologies FastTrack
Volume: 0 | Article ID: 030415
Image
Class-Aware Visual Prompt Learning for Vision Language Models
Abstract
Abstract

Vision-language pre-trained (VLP) models, such as CLIP, have exhibited remarkable performance in downstream tasks with excellent generalization capabilities. Meanwhile, textual and visual prompt learning have been widely adopted to enhance VLP model performance in downstream tasks. However, a challenging issue in visual prompt learning is the inferior ability on few-shot recognition tasks, the inability to capture specific class information. Thus, we propose a class-aware visual prompt learning method to enhance the perceptual abilities of VLP model with an independent class prompting module, which consists of trainable prompts for each class. As class-aware prompts tend to be inaccurate in the training process, we developed an intra-class compactness loss and inter-class dispersion loss to enhance the intra-class consistency. Finally, we introduced attention-based adapter layers to tackle the prompt selection issue. Extensive experiments demonstrated that our method achieved superior efficiency and effectiveness, surpassing previous visual prompting methods in a series of downstream datasets.

Subject Areas :
Views 0
Downloads 0
 articleview.views 0
 articleview.downloads 0
  Cite this article 

Sihui Zhang, Zhijiang Li, "Class-Aware Visual Prompt Learning for Vision Language Modelsin Journal of Imaging Science and Technology,  2025,  pp 1 - 7,  https://doi.org/10.2352/J.ImagingSci.Technol.2025.69.3.030415

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2025
  Article timeline 
  • received June 2024
  • accepted November 2024

Preprint submitted to:
  Login or subscribe to view the content