
This paper presents an experimental multi-agent system developed for robust feature extraction from diverse multimedia documents, including images, PDFs, and technical drawings. Addressing the enterprise demand for structuring unstructured data, the system employs a flexible architecture that intelligently orchestrates specialized agents—ranging from (Optical Character Recognition) OCR and image processing to Large Language Models (LLMs)—to achieve high-fidelity extraction. A key innovation is the system's high configurability, which keeps human experts in the loop to refine extraction logic via prompt engineering. Furthermore, the architecture supports hybrid edge-cloud deployment, allowing raw documents to be processed locally to satisfy strict data sovereignty requirements, with only non-sensitive data ingested centrally. The experimental system has shown scalability and efficiency in real-world use cases.
Wangda Zhang, Anthony Absher, Zhen Li, Yujian Xu, Bin Shen, "A Configurable Multi-agent System for Feature Extraction from Multimedia Documents" in Electronic Imaging, 2026, pp 270-1 - 270-8, https://doi.org/10.2352/EI.2026.38.7.IMAGE-270