P.R. van der Vaart

Conference paper (2)

Preprint (1)

3 records found

Value Improved Actor Critic Algorithms

Preprint (2024) - Y. Oren (author), M.A. Zanger (author), P.R. van der Vaart (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), M.T.J. Spaan (author), Matthijs T. J. Spaan (author), J.W. Böhmer (author)

Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm ...

Bayesian Model-Free Deep Reinforcement Learning

Conference paper (2024) - P.R. van der Vaart (author)

Exploration in reinforcement learning remains a difficult challenge. In order to drive exploration, ensembles with randomized prior functions have recently been popularized to quantify uncertainty in the value model. However these ensembles have no theoretical reason to resemble ...

Bayesian Ensembles for Exploration in Deep Q-Learning

Conference paper (2024) - P.R. van der Vaart (author), N. Yorke-Smith (author), Neil Yorke-Smith (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), Matthijs T. J. Spaan (author), M.T.J. Spaan (author)

Exploration in reinforcement learning remains a difficult challenge. In order to drive exploration, ensembles with randomized prior functions have recently been popularized to quantify uncertainty in the value model. There is no theoretical reason for these ensembles to resemble ...