Evaluation of physical damage associated with action selection strategies in reinforcement learning

Koryakovskiy, I.; Vallery, H.; Babuska, R.; Caarls, W.

doi:10.1016/j.ifacol.2017.08.1218

Evaluation of physical damage associated with action selection strategies in reinforcement learning

Journal article (2017)

Authors

I. Koryakovskiy Biomechatronics & Human-Machine Control - Mechanical, Maritime and Materials Engineering

H. Vallery Biomechatronics & Human-Machine Control - Mechanical, Maritime and Materials Engineering

R. Babuska Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering

W. Caarls Pontifical Catholic University of Rio de Janeiro

Research Group

Biomechatronics & Human-Machine Control (Mechanical, Maritime and Materials Engineering) (TU Delft)

DOI: https://doi.org/10.1016/j.ifacol.2017.08.1218

Safety Adaptation Diagnosis Fault detection Analysis of reliability Autonomous robotic systems Learning in physical agents Reinforcement learning control

To reference this document use:

http://resolver.tudelft.nl/uuid:8df049df-64ce-430c-b4c5-e345c961d58d

More Info

expand_more

Published Date

2017

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical, Maritime and Materials Engineering

Department

Biomechanical Engineering

Research Group

Biomechatronics & Human-Machine Control

Abstract

Reinforcement learning techniques enable robots to deal with their own dynamics and with unknown environments without using explicit models or preprogrammed behaviors. However, reinforcement learning relies on intrinsically risky exploration, which is often damaging for physical systems. In the case of the bipedal walking robot Leo, which is studied in this paper, two sources of damage can be identified: fatigue of gearboxes due to backlash re-engagements, and the overall system damage due to falls of the robot. We investigate several exploration techniques and compare them in terms of gearbox fatigue, cumulative number of falls and undiscounted return. The results show that exploration with the Ornstein-Uhlenbeck (OU) process noise leads to the highest return, but at the same time it causes the largest number of falls. The Previous Action-Dependent Action (PADA) method results in drastically reduced fatigue, but also a large number of falls. The results reveal a previously unknown trade-off between the two sources of damage. Inspired by the OU and PADA methods, we propose four new action-selection methods in a systematic way. One of the proposed methods with a time-correlated noise outperforms the well-known e-greedy method in all three benchmarks. We provide guidance towards the choice of exploration strategy for reinforcement learning applications on real physical systems.

Files

1_s2.0_S240589631731724X_main.... (pdf)

(pdf | 2.15 Mb)

Unknown license