F.A. Oliehoek | TU Delft Repository

Preface

Foreword postscript (2025) - Frans A. Oliehoek (author) , M. Kok (author) , SE Verwer (author)

In this volume, we are happy present the post-proceedings of BNAIC/BeNeLearn 2023, the joint conference on Artificial Intelligence and Machine Learning in the BeNeLux, which took place at TU Delft. It is the main regional conference on these topics and has a long tradition: in 20 ...

Policy Space Response Oracles

A Survey

Conference paper (2024) - A. Bighashdel (author) , Yongzhao Wang (author) , Stephen McAleer (author) , Rahul Savani (author) , Frans Oliehoek (author)

Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey prov ...

What model does MuZero learn?

Conference paper (2024) - Jinke He (author) , Thomas M. Moerland (author) , J.A. de Vries (author) , F.A. Oliehoek (author)

Model-based reinforcement learning (MBRL) has drawn considerable interest in recent years, given its promise to improve sample efficiency. Moreover, when using deep-learned models, it is possible to learn compact and generalizable models from data. In this work, we study MuZero, ...

Safe Multi-agent Learning via Trapping Regions

Conference paper (2023) - A.T. Czechowski (author) , Frans Oliehoek (author)

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to m ...

Teacher-apprentices RL (TARL)

Leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

Journal article (2023) - Shi Yuan Tang (author) , Athirai A. Irissappane (author) , Frans Oliehoek (author) , Jie Zhang (author)

Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across diff ...

What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

Conference paper (2023) - Z. MS Osika (author) , Jazmin Zatarain Salazar (author) , Diederik M. Roijers (author) , F.A. Oliehoek (author) , P.K. Murukannaiah (author)

We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fie ...

Safety Guarantees in Multi-agent Learning via Trapping Regions

Journal article (2023) - A.T. Czechowski (author) , Frans Oliehoek (author)

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to m ...

Influence-aware memory architectures for deep reinforcement learning in POMDPs

Journal article (2022) - Miguel Suau (author) , Jinke He (author) , E. Congeduti (author) , Rolf Starre (author) , A.T. Czechowski (author) , Frans Oliehoek (author)

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use r ...

Multi Robot Surveillance and Planning in Limited Communication Environments

Conference paper (2022) - V. Inna Kedege (author) , Aleksander Czechowski (author) , Ludo Stellingwerff (author) , Frans A Oliehoek (author)

Distributed robots that survey and assist with search & rescue operations usually deal with unknown environments with limited communication. This paper focuses on distributed & cooperative multi-robot area coverage strategies of unknown environments, having constrained co ...

Difference rewards policy gradients

Journal article (2022) - Jacopo Castellini (author) , Sam Devlin (author) , Frans Oliehoek (author) , Rahul Savani (author)

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall pe ...

Speeding up Deep Reinforcement Learning through Influence-Augmented Local Simulators

Conference paper (2022) - Miguel Suau (author) , Jinke He (author) , Matthijs T. J. Spaan (author) , Frans Oliehoek (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simul ...

BADDr

Bayes-Adaptive Deep Dropout RL for POMDPs

Conference paper (2022) - Sammie Katt (author) , Hai Nguyen (author) , Frans Oliehoek (author) , Christopher Amato (author)

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but s ...

Model-Based Reinforcement Learning with State Abstraction: A Survey

Conference paper (2022) - R.A.N. Starre (author) , M Loog (author) , Frans A Oliehoek (author)

Model-based reinforcement learning methods are promising since they can increase sample efficiency while simultaneously improving generalizability. Learning can also be made more efficient through state abstraction, which delivers more compact models. Model-based reinforcement le ...

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Conference paper (2022) - Mustafa Mert Celikok (author) , Frans Oliehoek (author) , Samuel Kaski (author)

Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human ...

On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games

Journal article (2022) - R.T. Loftin (author) , Frans A. Oliehoek (author)

Learning to cooperate with other agents is challenging when those agents also possess the ability to adapt to our own behavior. Practical and theoretical approaches to learning in cooperative settings typically assume that other agents' behaviors are stationary, or else make very ...

A Cross-Field Review of State Abstraction for Markov Decision Processes

Conference paper (2022) - E. Congeduti (author) , Frans A Oliehoek (author)

Complex real-world systems pose a significant challenge to decision making: an agent needs to explore a large environment, deal with incomplete or noisy information, generalize the experience and learn from feedback to act optimally. These processes demand vast representation cap ...

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Conference paper (2022) - Miguel Suau (author) , Jinke He (author) , Mustafa Mert Çelikok (author) , Matthijs T. J. Spaan (author) , F.A. Oliehoek (author)

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we sh ...

Online Planning in POMDPs with Self-Improving Simulators

Conference paper (2022) - J. He (author) , M. Suau (author) , Hendrik Baier (author) , Michael Kaisers (author) , Frans A. Oliehoek (author)

How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over tim ...

Back to the Future

Solving Hidden Parameter MDPs with Hindsight

Conference paper (2022) - C.T. Ponnambalam (author) , Danial Kamran (author) , Thiago D. Simão (author) , Frans A Oliehoek (author) , M. T.J. Spaan (author)

Multi-agent MDP homomorphic networks

Conference paper (2022) - Elise Pol (author) , Herke van Hoof (author) , Frans A Oliehoek (author) , Max Welling (author)

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In coopera ...