Vision-language pre-trained (VLP) models, such as CLIP, have exhibited remarkable performance in downstream tasks with excellent generalization capabilities. Meanwhile, textual and visual prompt learning have been widely adopted to enhance VLP model performance in downstream tasks. However, a challenging issue in visual prompt learning is the inferior ability on few-shot recognition tasks, the inability to capture specific class information. Thus, we propose a class-aware visual prompt learning method to enhance the perceptual abilities of VLP model with an independent class prompting module, which consists of trainable prompts for each class. As class-aware prompts tend to be inaccurate in the training process, we developed an intra-class compactness loss and inter-class dispersion loss to enhance the intra-class consistency. Finally, we introduced attention-based adapter layers to tackle the prompt selection issue. Extensive experiments demonstrated that our method achieved superior efficiency and effectiveness, surpassing previous visual prompting methods in a series of downstream datasets.
Sihui Zhang, Zhijiang Li, "Class-Aware Visual Prompt Learning for Vision Language Models" in Journal of Imaging Science and Technology, 2025, pp 1 - 7, https://doi.org/10.2352/J.ImagingSci.Technol.2025.69.3.030415