Back to articles
Proceedings Paper
Volume: 37 | Article ID: IMAGE-262
Image
Automating Product Image Analysis for Retail with Gemini
  DOI :  10.2352/EI.2025.37.8.IMAGE-262  Published OnlineFebruary 2025
Abstract
Abstract

We present the application of a Multimodal Large Language Model, specifically Gemini, in automating product image analysis for the retail industry. We demonstrate how Gemini's ability to generate text based on mixed image-text prompts enables two key applications: 1) Product Attribute Extraction, where various attributes of a product in an image can be extracted using open or closed vocabularies and used for any downstream analytics by the retailers, and 2) Product Recognition, where a product in a user-provided image is identified, and its corresponding product information is retrieved from a retailer's search index to be returned to the user. In both cases, Gemini acts as a powerful and easily customizable recognition engine, simplifying the processing pipeline for retailers' developer teams. Traditionally, these tasks required multiple models (object detection, OCR, attributes classification, embedding, etc) working together, as well as extensive custom data collection and domain expertise. However, with Gemini, these tasks are streamlined by writing a set of prompts and straightforward logic to connect their outputs.

Subject Areas :
Views 1
Downloads 0
 articleview.views 1
 articleview.downloads 0
  Cite this article 

Tianli Yu, Daniel Vlasic, "Automating Product Image Analysis for Retail with Geminiin Electronic Imaging,  2025,  pp 262-1 - 262-7,  https://doi.org/10.2352/EI.2025.37.8.IMAGE-262

 Copy citation
  Copyright statement 
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA