Bayesian Ensembles for Exploration in Deep Q-Learning

van der Vaart, P.R.; Yorke-Smith, Neil; Spaan, M. T.J.

Bayesian Ensembles for Exploration in Deep Q-Learning

Conference paper (2024)

Authors

P.R. van der Vaart Sequential Decision Making

Neil Yorke-Smith Algorithmics

M. T.J. Spaan Sequential Decision Making

Research Group

Sequential Decision Making

To reference this document use:

http://resolver.tudelft.nl/uuid:25bafe80-a4a7-498c-8322-5553b575d5f4

More Info

expand_more

Published Date

2024

Language

English

Research Group

Sequential Decision Making

Abstract

Exploration in reinforcement learning remains a difficult challenge. In order to drive exploration, ensembles with randomized prior functions have recently been popularized to quantify uncertainty in the value model. There is no theoretical reason for these ensembles to resemble the actual posterior, however. In this work, we view training ensembles from the perspective of Sequential Monte Carlo, a Monte Carlo method that approximates a sequence of distributions with a set of particles. In particular, we propose an algorithm that exploits both the practical flexibility of ensembles and theory of the Bayesian paradigm. We incorporate this method into a standard Deep Q-learning agent (DQN) and experimentally show qualitatively good uncertainty quantification and improved exploration capabilities over a regular ensemble.

Files

P2528.pdf

(pdf | 0.964 Mb)