Perception and Control with Large Language Models in Robotic Manipulation

Developing and assessing an integrated Large Language Model System on environmental and task complexity

More Info
expand_more

Abstract

Large Language Models (LLMs) possess significant semantic knowledge about the world, making them valuable for high level control for robots through Vision-Language-Action (VLA) models. These models integrate an LLM to deduce semantic knowledge from a robot’s vision and natural language inputs, facilitating real-world actions. Despite their potential, VLA models are a relatively new research area, with applications mostly limited to simulations or household tasks and insufficient validation in broader contexts. This study aims to develop a LLM-Enhanced Robotic Affordance and Control System (LERACS) for robotic manipulation in applied cases such as the management and maintenance of electrical grid infrastructure. LERACS is designed to visually ground manipulable objects and decompose tasks based on user instructions within a human-robot interaction chat interface, using ChatGPT. A system validation and an AI user experiment were conducted to evaluate its effectiveness in interpreting and performing actions based on pre-made and synthetically generated user instructions. These assessed LERACS’ performance across various settings and instructions. Results indicate high success rates in environmental interpretation and task execution, with robust labeling accuracy, especially in complex settings. Feedback from the AI user experiment highlighted LERACS’ adaptability, identified areas for improvement, and demonstrated its practical utility across diverse settings and task complexities.

Files