Human corrective advice in the policy search loop

Abstract (2017)

Authors

Carlos Celemin Universidad de Santiago de Chile

Guilherme Maeda Technische Universität Darmstadt

J. Kober Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering

Javier Ruiz-del-Solar Universidad de Santiago de Chile

Research Group

Learning & Autonomous Control (Mechanical, Maritime and Materials Engineering) (TU Delft)

Reinforcement Learning Interactive machine learning Learning from demonstration Movement primitives

To reference this document use:

http://resolver.tudelft.nl/uuid:633279c4-5b0d-41d4-87b6-b6ceb3240208

More Info

expand_more

Published Date

2017

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical, Maritime and Materials Engineering

Department

Cognitive Robotics

Research Group

Learning & Autonomous Control

Abstract

Machine Learning methods applied to decision making problems with real robots usually suffer from slow convergence due to the dimensionality of the search and difficulties in the reward design. Interactive Machine Learning (IML) or Learning from Demonstrations (LfD) methods are usually simple and relatively fast for improving a policy but have the drawback of being sensitive to the inherent occasional erroneous feedback from human teachers. Reinforcement Learning (RL) methods may converge to optimal solutions according to the encoded reward function, but they become inefficient as the dimensionality of the state-action space grows.

Files

Celemin_IROS_Workshop_2017_3.p... (pdf)

(pdf | 0.304 Mb)

Download not available