Off-policy experience retention for deep actor-critic learning

de Bruin, T.D.; Kober, J.; Tuyls, Karl; Babuška, R.

Off-policy experience retention for deep actor-critic learning

Conference paper (2016)

Authors

T.D. de Bruin OLD Intelligent Control & Robotics

J. Kober OLD Intelligent Control & Robotics

Karl Tuyls OLD Intelligent Control & Robotics , University of Liverpool

R. Babuška OLD Intelligent Control & Robotics

Research Group

OLD Intelligent Control & Robotics

To reference this document use:

http://resolver.tudelft.nl/uuid:dc71d6f5-50b2-466b-ba37-83f397e7c5a9

More Info

expand_more

Published Date

2016

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

OLD Intelligent Control & Robotics

Abstract

When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences.

Files

Tim_Offpolicyexperienceretenti... (pdf)

(pdf | 0.703 Mb)

Unknown license

Download not available