Human corrective advice in the policy search loop

More Info
expand_more

Abstract

Machine Learning methods applied to decision making problems with real robots usually suffer from slow convergence due to the dimensionality of the search and difficulties in the reward design. Interactive Machine Learning (IML) or Learning from Demonstrations (LfD) methods are usually simple and relatively fast for improving a policy but have the drawback of being sensitive to the inherent occasional erroneous feedback from human teachers. Reinforcement Learning (RL) methods may converge to optimal solutions according to the encoded reward function, but they become inefficient as the dimensionality of the state-action space grows.

Files

Celemin_IROS_Workshop_2017_3.p... (pdf)
(pdf | 0.304 Mb)

Download not available