M.T.J. Spaan | TU Delft Repository

Value Improved Actor Critic Algorithms

Preprint (2024) - Y. Oren (author), M.A. Zanger (author), P.R. van der Vaart (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), M.T.J. Spaan (author), Matthijs T. J. Spaan (author), J.W. Böhmer (author)

Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm ...

Bayesian Ensembles for Exploration in Deep Q-Learning

Conference paper (2024) - P.R. van der Vaart (author), N. Yorke-Smith (author), Neil Yorke-Smith (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), Matthijs T. J. Spaan (author), M.T.J. Spaan (author)

Exploration in reinforcement learning remains a difficult challenge. In order to drive exploration, ensembles with randomized prior functions have recently been popularized to quantify uncertainty in the value model. There is no theoretical reason for these ensembles to resemble ...

Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks

Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

Conference paper (2024) - Federico Bianchi (author), Edoardo Zorzi (author), Alberto Castellini (author), Thiago D. Simão (author), Matthijs T. J. Spaan (author), Matthijs Spaan (author), M.T.J. Spaan (author), Matthijs T.J. Spaan (author), Alessandro Farinelli (author)

In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied because of large state and action spaces. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseli ...

Reinforcement Learning by Guided Safe Exploration

Conference paper (2023) - Q. Yang (author), T. D. Simão (author), Nils Jansen (author), Simon Tindemans (author), Simon H. Tindemans (author), Matthijs T. J. Spaan (author), Matthijs T.J. Spaan (author), M.T.J. Spaan (author), Matthijs Spaan (author)

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward- ...

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

Conference paper (2023) - M. Suau (author), Matthijs Spaan (author), M.T.J. Spaan (author), Matthijs T.J. Spaan (author), Matthijs T. J. Spaan (author), F.A. Oliehoek (author), Frans A. Oliehoek (author)

Reinforcement learning agents may sometimes develop habits that are effective
only when specific policies are followed. After an initial exploration phase in which
agents try out different actions, they eventually converge toward a particular policy.
When this occurs, ...

The Role of Diverse Replay for Generalisation in Reinforcement Learning

Conference paper (2023) - M.R. Weltevrede (author), Matthijs T. J. Spaan (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), M.T.J. Spaan (author), J.W. Böhmer (author)

In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the ...

E-MCTS: Deep Exploration in Model-Based Reinforcement Learning by Planning with Epistemic Uncertainty

Conference paper (2023) - Y. Oren (author), Matthijs T.J. Spaan (author), Matthijs T. J. Spaan (author), Matthijs Spaan (author), M.T.J. Spaan (author), J.W. Böhmer (author)

One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS). Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and reliability in the face of the unknown, and ...

Diverse Projection Ensembles for Distributional Reinforcement Learning

Conference paper (2023) - M.A. Zanger (author), J.W. Böhmer (author), Matthijs T.J. Spaan (author), Matthijs T. J. Spaan (author), Matthijs Spaan (author), M.T.J. Spaan (author)

In contrast to classical reinforcement learning, distributional reinforcement learning algorithms aim to learn the distribution of returns rather than their expected value. Since the nature of the return distribution is generally unknown a priori or arbitrarily complex, a common ...

CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration

Conference paper (2023) - Q. Yang (author), Matthijs T. J. Spaan (author), M.T.J. Spaan (author), Matthijs T.J. Spaan (author), Matthijs Spaan (author)

Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks. An under-explored aspect of reinforcement learning is how to achieve safe efficient explor ...

Scalable Safe Policy Improvement via Monte Carlo Tree Search

Journal article (2023) - Alberto Castellini (author), Federico Bianchi (author), Edoardo Zorzi (author), Edoardo Zorzi (author), Thiago D. Simão (author), Alessandro Farinelli (author), Matthijs T. J. Spaan (author), Matthijs Spaan (author), M.T.J. Spaan (author), Matthijs T.J. Spaan (author)

Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We th ...

Abstraction-Refinement for Hierarchical Probabilistic Models

Conference paper (2022) - Sebastian Junges (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), Matthijs T. J. Spaan (author), M.T.J. Spaan (author)

Markov decision processes are a ubiquitous formalism for modelling systems with non-deterministic and probabilistic behavior. Verification of these models is subject to the famous state space explosion problem. We alleviate this problem by exploiting a hierarchical structure with ...

Back to the Future

Solving Hidden Parameter MDPs with Hindsight

Conference paper (2022) - C.T. Ponnambalam (author), Danial Kamran (author), T. D. Simão (author), F.A. Oliehoek (author), Frans A. Oliehoek (author), Matthijs T. J. Spaan (author), Matthijs T.J. Spaan (author), Matthijs Spaan (author), M.T.J. Spaan (author)

A Modern Perspective on Safe Automated Driving for Different Traffic Dynamics using Constrained Reinforcement Learning

Conference paper (2022) - Danial Kamran (author), T. D. Simão (author), Q. Yang (author), C.T. Ponnambalam (author), Johannes Fischer (author), Matthijs Spaan (author), Matthijs T. J. Spaan (author), M.T.J. Spaan (author), Matthijs T.J. Spaan (author), Martin Lauer (author)

The use of reinforcement learning (RL) in real-world domains often requires extensive effort to ensure safe behavior. While this compromises the autonomy of the system, it might still be too risky to allow a learning agent to freely explore its environment. These strict impositio ...

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Conference paper (2022) - M. Suau (author), J. He (author), Mustafa Mert Çelikok (author), Matthijs T. J. Spaan (author), Matthijs T.J. Spaan (author), Matthijs Spaan (author), M.T.J. Spaan (author), F.A. Oliehoek (author), Frans A. Oliehoek (author)

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we sh ...

Training and Transferring Safe Policies in Reinforcement Learning

Conference paper (2022) - Q. Yang (author), T. D. Simão (author), Nils Jansen (author), Simon Tindemans (author), Simon H. Tindemans (author), Matthijs T. J. Spaan (author), Matthijs T.J. Spaan (author), M.T.J. Spaan (author), Matthijs Spaan (author)

Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free R ...

Influence-Augmented Local Simulators

A Scalable Solution for Fast Deep RL in Large Networked Systems

Conference paper (2022) - M. Suau (author), J. He (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), M.T.J. Spaan (author), Matthijs T. J. Spaan (author), Frans A. Oliehoek (author), F.A. Oliehoek (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simul ...

Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

Conference paper (2022) - Q. Yang (author), T. D. Simão (author), Simon Tindemans (author), Simon H. Tindemans (author), M.T.J. Spaan (author), Matthijs T. J. Spaan (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author)

Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. ...

Speeding up Deep Reinforcement Learning through Influence-Augmented Local Simulators

Conference paper (2022) - M. Suau (author), J. He (author), Matthijs Spaan (author), Matthijs T.J. Spaan (author), M.T.J. Spaan (author), Matthijs T. J. Spaan (author), Frans A. Oliehoek (author), F.A. Oliehoek (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simul ...

An Auction-Based Multi-Agent System for the Pickup and Delivery Problem with Autonomous Vehicles and Alternative Locations

Conference paper (2022) - J. Los (author), Frederik Schulte (author), F. Schulte (author), Matthijs Spaan (author), M.T.J. Spaan (author), Matthijs T.J. Spaan (author), Matthijs T. J. Spaan (author), R. R. Negenborn (author), Rudy R. Negenborn (author), R. Negenborn (author), Rudy Negenborn (author), R.R. Negenborn (author)

The trends of autonomous transportation and mobility on demand in line with large numbers of requests increasingly call for decentralized vehicle routing optimization. Multi-agent systems (MASs) allow to model fully autonomous decentralized decision making, but are rarely conside ...