Directed Increment Policy Search for Behavior Tree Task Performance Optimization

Leest, S.A.

Directed Increment Policy Search for Behavior Tree Task Performance Optimization

Crossing the Reality Gap

Master thesis (2017)

Authors

S.A. Leest Aerospace Engineering

Contributors

E. van Kampen (mentor)

G. C.H.E. de Croon (mentor)

K.Y.W. Scheper (mentor)

Faculty

Aerospace Engineering, Aerospace Engineering

Https://repository.tudelft.nl/Thing_ff167b06-bbaf-4897-b76c-9f246e50eadb Reinforcement Learning Behavior Tree Robotics Behavior Policy Search

To reference this document use:

http://resolver.tudelft.nl/uuid:ff167b06-bbaf-4897-b76c-9f246e50eadb

More Info

expand_more

Published Date

04-12-2017

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

Robotic behavior policies learned in simulation suffer from a performance degradation once transferred to a real-world robotic platform. This performance degradation originates from discrepancies between the real-world and simulation environment, referred to as the reality gap. To cross the reality gap, this papers presents a simple reinforcement learning algorithm named Directed Increment Policy Search (DIPS). DIPS is a form of episodic model-free policy search which leverages the interpretable structure and the coupling of the Behavior Tree (BT) parameters to reduce the number of required real-world evaluations. Additionally, DIPS does not require a form of reward function crafting and is robust to hyper-parameter settings. DIPS is evaluated on a simulated model of the DelFly Explorer which is tasked to perform a window fly-through maneuver. It is demonstrated that DIPS efficiently and effectively improves the BT behavior policy performance for three simulated environments with increasingly large reality gaps. We believe DIPS can generalize to other behavior representation methods and tasks due to the inherent coupling between behavior and environment experienced by embodied robots.

Files

171120_Thesis_Report_Steven_Le... (pdf)

(pdf | 6.78 Mb)

Unknown license