Language-Guided Semantic Affordance Exploration for Efficient Reinforcement Learning

Ma, R.

Language-Guided Semantic Affordance Exploration for Efficient Reinforcement Learning

Master thesis (2024)

Authors

R. Ma Mechanical Engineering

Contributors

J. Kober Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (mentor)

J.D. Luijkx Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (mentor)

Z. Ajanović Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (mentor)

D. Boskos Team Dimitris Boskos - Mechanical, Maritime and Materials Engineering (graduation committee member)

L. Peternel Human-Robot Interaction - Mechanical, Maritime and Materials Engineering (graduation committee member)

Faculty

Mechanical Engineering

Reinforcement Learning Robot manipulation LLMs

To reference this document use:

http://resolver.tudelft.nl/uuid:1438cb53-65f0-4423-9678-3c96899c8f4f

More Info

expand_more

Published Date

27-09-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Reinforcement Learning (RL) shows great potential for robotic manipulation tasks, yet it suffers from low sample efficiency and needs extensive exploration of state-action spaces. Some recent methods leverage the commonsense knowledge and reasoning abilities of Large Language Models (LLMs) to guide RL exploration toward more meaningful states. However, LLMs may generate semantically correct but physically infeasible plans, leading to unreliable solutions. In this paper, we propose \textit{Language-Guided exploration for Reinforcement Learning} (LGRL), a novel framework that utilizes LLMs' planning capability to directly guide RL exploration. This approach utilizes LLM planning at both the task and affordance levels, enhancing learning efficiency by directing RL agents toward semantically meaningful actions. Unlike previous methods that rely on the optimality of LLM-generated plans or rewards, LGRL corrects sub-optimality and explores multimodal affordance-level plans without human intervention.
We evaluated LGRL on pick-and-place tasks within standard RL benchmark environments, demonstrating significant improvements in both sample efficiency and success rates.

Files

RunyuMa_Thesis_with_template.p... (pdf)

(pdf | 21.4 Mb)

Unknown license