Robotics has traditionally relied on a multitude of sensors and extensive programming to interpret and navigate environments. However, these systems often struggle in dynamic and unpredictable settings. In this work, we explore the integration of large language models (LLMs) such as GPT-4 into robotic navigation systems to enhance decision-making and adaptability in complex environments. Unlike many existing robotics frameworks, our approach uniquely leverages the advanced natural language and image processing capabilities of LLMs to enable robust navigation using only a single camera and an ultrasonic sensor, eliminating the need for multiple specialized sensors and extensive pre-programmed responses. By bridging the gap between perception and planning, this framework introduces a novel approach to robotic navigation. It aims to create more intelligent and flexible robotic systems capable of handling a broader range of tasks and environments, representing a major leap in autonomy and versatility for robotics. Experimental evaluations demonstrate promising improvements in the robot’s effectiveness and efficiency across object recognition, motion planning, obstacle manipulation, and environmental adaptability, highlighting its potential for more advanced applications. Future developments will focus on enabling LLMs to autonomously generate motion profiles and executable code for tasks based on verbal instructions, allowing these actions to be carried out without human intervention. This advancement will further enhance the robot’s ability to perform specific actions independently, improving both its autonomy and operational efficiency.
Xunyu Pan, Jeremy Perando, "Enhancing Robotic Navigation with Large Language Models" in Electronic Imaging, 2025, pp 105-1 - 105-7, https://doi.org/10.2352/EI.2025.37.15.AVM-105