Back to articles
Proceedings Paper
Volume: 37 | Article ID: HPCI-172
Image
Write Sentence with Images: Revisit the Large Vision Model with Visual Sentence
  DOI :  10.2352/EI.2025.37.12.HPCI-172  Published OnlineFebruary 2025
Abstract
Abstract

This paper introduces a novel framework for generating high-quality images from “visual sentences” extracted from video sequences. By combining a lightweight autoregressive model with a Vector Quantized Generative Adversarial Network (VQGAN), our approach achieves a favorable trade-off between computational efficiency and image fidelity. Unlike conventional methods that require substantial resources, the proposed framework efficiently captures sequential patterns in partially annotated frames and synthesizes coherent, contextually accurate images. Empirical results demonstrate that our method not only attains state-of-the-art performance on various benchmarks but also reduces inference overhead, making it well-suited for real-time and resource-constrained environments. Furthermore, we explore its applicability to medical image analysis, showcasing robust denoising, brightness adjustment, and segmentation capabilities. Overall, our contributions highlight an effective balance between performance and efficiency, paving the way for scalable and adaptive image generation across diverse multimedia domains.

Subject Areas :
Views 36
Downloads 9
 articleview.views 36
 articleview.downloads 9
  Cite this article 

Quan Liu, Can Cui, Ruining Deng, Tianyuan Yao, Yuechen Yang, Yucheng Tang, Yuankai Huo, "Write Sentence with Images: Revisit the Large Vision Model with Visual Sentencein Electronic Imaging,  2025,  pp 172-1 - 172-5,  https://doi.org/10.2352/EI.2025.37.12.HPCI-172

 Copy citation
  Copyright statement 
Copyright © 2025 Society for Imaging Science and Technology 2025
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA