Back to articles
Proceedings Paper
Volume: 38 | Article ID: GENAI-177
Image
Shelfie: A Vision-language System for Structured Product Understanding in Real-world Consumer Environments
  DOI :  10.2352/EI.2026.38.12.GENAI-177  Published OnlineMarch 2026
Abstract
Abstract

Personalized consumer experiences increasingly depend on understanding actual product usage in everyday settings. We present Shelfie, a consumer-centric vision-language system that extracts structured product metadata from user-submitted images of lifestyle care products arranged in real-world contexts – such as consumer shelves, countertops, and vanity spaces. Unlike conventional systems developed for controlled retail environments and dependent on barcode scanning, Shelfie is intentionally designed to operate effectively in cluttered, unconstrained home settings and is fully barcode-independent. Shelfie integrates object detection, instance segmentation and large language model (LLM)-based reasoning to infer rich metadata for each visible product. This includes product brand, name, category, form, package type, key ingredients, benefits, and size. The Shelfie system is trained and validated on a diverse user-sourced dataset covering personal care to home lifestyle products, demonstrating strong generalization in producing high-accuracy highly structured output across packaging styles and product categories. Shelfie establishes a vision-language foundation for real-world consumer-facing product understanding and discovery systems. It can enable downstream applications such as community-driven recommendation systems, ingredient sensitivity tracking, and indepth consumer behavior analysis all while keeping consumer habits, needs and convenience at the center. By bridging visual input with structured metadata output, Shelfie can enable more informed, personalized decisions through peer-driven insights.

Subject Areas :
Views 49
Downloads 10
 articleview.views 49
 articleview.downloads 10
  Cite this article 

Ankur Purwar, Alex Hollingworth, Laveena Satwani, Eu Jack Tan, Nandita Mishra, Pranav Mishra, "Shelfie: A Vision-language System for Structured Product Understanding in Real-world Consumer Environmentsin Electronic Imaging,  2026,  pp 177-1 - 177-9,  https://doi.org/10.2352/EI.2026.38.12.GENAI-177

 Copy citation
  Copyright statement 
Copyright ©2026 Society for Imaging Science and Technology 2026
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA