A Survey of Mobile Agents: From Code Mobility to Large Multimodal Model-driven Autonomy

Jianhang  Chen

doi:10.2352/EI.2026.38.7.IMAGE-269

Abstract

This paper surveys mobile agents, tracing their evolution from the early paradigm of autonomous, migrating code to the contemporary era of sophisticated agents driven by Large Multimodal Models (LMMs). We begin by establishing a taxonomy that distinguishes historical network-centric agent architectures from modern LMM-native, user-centric systems designed for mobile environments. We then analyze the operational workflows and architectural patterns that enable robust automation of complex tasks on Graphical User Interfaces (GUIs), with particular emphasis on the shift toward multi-agent frameworks, hierarchical control, and local-first execution. A critical review of representative state-of-the-art systems, including Mobile-Agent-v3.5, ClawMobile, Droidrun-appcard, and OpenClaw, is presented alongside an examination of benchmarks such as AndroidWorld that drive progress in the field. Furthermore, we discuss the transition toward edge-native multimodal models and address novel vulnerabilities unique to LMM-powered GUI automation before concluding with future research directions toward generalized mobile autonomy.