Back to articles
Proceedings Paper
Volume: 38 | Article ID: IMAGE-269
Image
A Survey of Mobile Agents: From Code Mobility to Large Multimodal Model-driven Autonomy
  DOI :  10.2352/EI.2026.38.7.IMAGE-269  Published OnlineMarch 2026
Abstract
Abstract

This paper surveys mobile agents, tracing their evolution from the early paradigm of autonomous, migrating code to the contemporary era of sophisticated agents driven by Large Multimodal Models (LMMs). We begin by establishing a taxonomy that distinguishes historical network-centric agent architectures from modern LMM-native, user-centric systems designed for mobile environments. We then analyze the operational workflows and architectural patterns that enable robust automation of complex tasks on Graphical User Interfaces (GUIs), with particular emphasis on the shift toward multi-agent frameworks, hierarchical control, and local-first execution. A critical review of representative state-of-the-art systems, including Mobile-Agent-v3.5, ClawMobile, Droidrun-appcard, and OpenClaw, is presented alongside an examination of benchmarks such as AndroidWorld that drive progress in the field. Furthermore, we discuss the transition toward edge-native multimodal models and address novel vulnerabilities unique to LMM-powered GUI automation before concluding with future research directions toward generalized mobile autonomy.

Subject Areas :
Views 6
Downloads 1
 articleview.views 6
 articleview.downloads 1
  Cite this article 

Jianhang Chen, "A Survey of Mobile Agents: From Code Mobility to Large Multimodal Model-driven Autonomyin Electronic Imaging,  2026,  pp 269-1 - 269-7,  https://doi.org/10.2352/EI.2026.38.7.IMAGE-269

 Copy citation
  Copyright statement 
Copyright ©2026 Society for Imaging Science and Technology 2026
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA