Directed Increment Policy Search for Behavior Tree Task Performance Optimization

Crossing the Reality Gap

More Info
expand_more

Abstract

Robotic behavior policies learned in simulation suffer from a performance degradation once transferred to a real-world robotic platform. This performance degradation originates from discrepancies between the real-world and simulation environment, referred to as the reality gap. To cross the reality gap, this papers presents a simple reinforcement learning algorithm named Directed Increment Policy Search (DIPS). DIPS is a form of episodic model-free policy search which leverages the interpretable structure and the coupling of the Behavior Tree (BT) parameters to reduce the number of required real-world evaluations. Additionally, DIPS does not require a form of reward function crafting and is robust to hyper-parameter settings. DIPS is evaluated on a simulated model of the DelFly Explorer which is tasked to perform a window fly-through maneuver. It is demonstrated that DIPS efficiently and effectively improves the BT behavior policy performance for three simulated environments with increasingly large reality gaps. We believe DIPS can generalize to other behavior representation methods and tasks due to the inherent coupling between behavior and environment experienced by embodied robots.