A Configurable Multi-agent System for Feature Extraction from Multimedia Documents

Wangda  Zhang; Anthony  Absher; Zhen  Li; Yujian  Xu; Bin  Shen

doi:10.2352/EI.2026.38.7.IMAGE-270

Abstract

This paper presents an experimental multi-agent system developed for robust feature extraction from diverse multimedia documents, including images, PDFs, and technical drawings. Addressing the enterprise demand for structuring unstructured data, the system employs a flexible architecture that intelligently orchestrates specialized agents—ranging from (Optical Character Recognition) OCR and image processing to Large Language Models (LLMs)—to achieve high-fidelity extraction. A key innovation is the system's high configurability, which keeps human experts in the loop to refine extraction logic via prompt engineering. Furthermore, the architecture supports hybrid edge-cloud deployment, allowing raw documents to be processed locally to satisfy strict data sovereignty requirements, with only non-sensitive data ingested centrally. The experimental system has shown scalability and efficiency in real-world use cases.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2026.38.7.IMAGE-270

IMAGE-270

Proceedings Paper

A Configurable Multi-agent System for Feature Extraction from Multimedia Documents

ZhangWangda

Celonis, US

AbsherAnthony

Celonis, US

LiZhen

Celonis, US

XuYujian

Celonis, US

ShenBin

Celonis, US

Abstract

132026

IMAGE

Imaging and Multimedia Analytics at the Edge 2026

270-1

270-8

2026