M.T.J. Spaan | TU Delft Repository

Gradient based adversarial domain randomization

Master thesis (2024) - G. Koning (author), M.T.J. Spaan (mentor), J.W. Böhmer (mentor), D.S. van der Heijden (mentor)

Recent advancements in differential simulators offer a promising approach to enhancing the sim2real transfer of reinforcement learning (RL) agents by enabling the computation of gradients of the simulator’s dynamics with respect to its parameters. However, the application of thes ...

Performance of Decision Transformer in multi-task offline reinforcement learning

How does the introduction of sub-optimal data affect the performance of the model?

Bachelor thesis (2024) - P.Z. Bieszczad (author), M.T.J. Spaan (mentor), M.R. Weltevrede (mentor), E. Congeduti (graduation committee member)

In the field of Artificial Intelligence (AI), techniques like Reinforcement Learning (RL) and Decision Transformer (DT) are utilized by machines to learn from experiences and solve problems. The distinction between offline and online learning determines whether the machine learns ...

Multi-Task Offline Reinforcement Learning

Experimental Evaluation of the Generalizability of the Soft Actor-Critic + Behavioral Cloning Algorithm

Bachelor thesis (2024) - A.O. Geist (author), M.T.J. Spaan (mentor), M.R. Weltevrede (mentor), E. Congeduti (graduation committee member)

This paper examines the generalization capabilities of the Soft Actor-Critic (SAC) algorithm when combined with Behavioral Cloning (BC) in a MiniGrid Four-Room Environment. Reinforcement learning (RL), particularly offline, is important for tasks where interactions with the envir ...

Generalization in Offline Reinforcement Learning: Comparing Implicit Q-Learning with Behavioral Cloning

Bachelor thesis (2024) - J.J. Tarazona Rodríguez (author), M.T.J. Spaan (mentor), M.R. Weltevrede (mentor), E. Congeduti (graduation committee member)

Offline Reinforcement Learning (Offline RL) involves learning policies from a static dataset without further interactions with the environment, making it suitable for high-stakes scenarios where data collection is costly or risky. This paper investigates the generalization capabi ...

One-Shot Generalization in Offline Reinforcement Learning with WSAC-N

Bachelor thesis (2024) - M.D.I. Museur (author), M.R. Weltevrede (mentor), M.T.J. Spaan (mentor), E. Congeduti (graduation committee member)

Recent work has shown that offline reinforcement learning (RL) does not generalize well to new environments compared to behavioral cloning (BC). We propose WSAC-N, an ensemble model of soft actor-critics with weights to de-emphasize actions with high variance. We compare the zero ...

Application of Self-Paced learning for noisy Meta-learning

Bachelor thesis (2024) - A. Aszalós (author), M.T.J. Spaan (mentor), J.A. de Vries (mentor), P.K. Murukannaiah (graduation committee member)

Meta-learning is an important emerging paradigm in machine learning, aimed at improving data-efficiency and generalization performance across learning tasks. Challenges caused by noisy data has been extensively researched in traditional learning settings. However, its impact in t ...

Teaching How to Learn to Learn

Teacher-Student Curriculum Learning for Efficient Meta-Learning

Bachelor thesis (2024) - B.B. Kovács (author), J.A. de Vries (mentor), M.T.J. Spaan (mentor), P.K. Murukannaiah (graduation committee member)

We investigate whether a teacher-student curriculum learning approach using a teacher network with a simpler structure than the student network can achieve better results at meta-learning. The goal of meta-learning is to learn from a set of tasks, and then perform well on a new, ...

Comparative Analysis of Curriculum Strategies in training Meta-Learning

Curriculum Strategies for Faster Meta-Learning

Bachelor thesis (2024) - M.T. Mihai (author), J.A. de Vries (mentor), M.T.J. Spaan (mentor), P.K. Murukannaiah (graduation committee member)

Meta-Learning is an emerging field where the main challenge is to develop models capable of distilling previous experiences to efficiently learn new tasks. Curriculum Learning, a group of optimization strategies, structures data in a meaningful order which aids learning. However, ...

Exploration When Everything Looks New

Effect of the Local Uncertainty Source on Exploration

Bachelor thesis (2024) - V. Vadocz (author), Y. Oren (mentor), M.T.J. Spaan (mentor), Neil Yorke-Smith (graduation committee member), N. Yorke-Smith (graduation committee member)

Agents improve by interacting with an environment and planning. By leveraging information about what they don't know, they can learn better and faster, at least in environments that benefit from exploring. They do this by estimating the uncertainty in their predictions. There are ...

Multi-task Offline Reinforcement Learning with CQL

A study on how dataset size and diversity increase generalization performance

Bachelor thesis (2024) - L. Lipinskas (author), M.T.J. Spaan (mentor), M.R. Weltevrede (mentor), E. Congeduti (graduation committee member)

Reinforcement learning (RL) is a type of machine learning where a model learns by
making an observation of the current state it is in, picking out an action to execute, and
observing the reward of said action, after which it receives the next state and repeats the
...

Exploring an Evolutionary Approach for Task Generation in Meta-Learning with Neural Processes

Bachelor thesis (2024) - K. Yoner (author), J.A. de Vries (mentor), M.T.J. Spaan (mentor), P.K. Murukannaiah (coach)

This paper explores the application of evolutionary algorithms to enhance task generation for Neural Processes (NPs) in meta-learning. Meta-learning aims to develop models capable of rapid adaptation to new tasks with minimal data, a necessity in fields where data collection is c ...

Policy Distillation in Offline Multi-task Reinforcement Learning

Master thesis (2024) - J.A.E. van Lith (author), D. Mambelli (mentor), M.T.J. Spaan (mentor), N.M. Gürel (graduation committee member)

In Reinforcement Learning (RL), an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards. Multi-Task Reinforcement Learning (MTRL) extends this concept by training a single agent to perform multiple tasks simultaneously, a ...

In Reinforcement Learning (RL), an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards. Multi-Task Reinforcement Learning (MTRL) extends this concept by training a single agent to perform multiple tasks simultaneously, allowing for more efficient use of resources and behavior sharing between tasks. Policy Distillation (PD) is a technique commonly used in MTRL, where policies from multiple single-task agents (teachers) are distilled into a single multi-task agent (student). This is done by merging common structure across tasks, while separating task-specific properties.

However, existing PD approaches require interactions with the environment during training. In this work, we investigate the effectiveness of PD in the offline setting, where the agent has no interaction with the environment before deployment and can only learn from previously collected data. Through a series of experiments, we demonstrate that a straightforward approach yields the highest performance. This approach involves first learning teacher policies using an existing offline RL algorithm, then distilling these policies into a student by sampling states from the offline data and applying a Mean Squared Error (MSE) loss between the teachers’ and student’s best actions. Moreover, we investigate the effect of a state distribution shift—a major challenge in offline RL—on our approach. We find that such shifts impact performance only slightly in cases of relatively small neural networks or substantial distribution shifts.

We also explore how PD can be enhanced to better capture common structure across related tasks, a key to improving efficiency in MTRL. To this end, we formally define common structure at two levels: the trajectory level and the computational level. To the best of our knowledge, we present the first attempt to quantify the amount of common structure shared across tasks. This measurement reveals that task commonalities are not fully exploited automatically. At the computational level, we attempt to improve sharing of common structure by reducing the network size and adding a regularization term to the loss function. To capture more common structure at the trajectory level, we argue that multi-task exploration is required, meaning that behaviors from one task must be evaluated in the context of another task. We propose two extensions to our approach that introduce multi-task exploration: Data Sharing (DS) and Offline Q-Switch (OQS). While these extensions are capable of improving performance, they also have clear limitations.

Overall, we propose a new, high-performing offline MTRL method and provide valuable insights into the fundamental capabilities and limitations of PD in capturing common structure across tasks, specifically within the offline MTRL setting.

Evaluating Robustness of Deep Reinforcement Learning for Autonomous Driving

How does entropy maximization affect the training and robustness of final policies under various testing conditions?

Bachelor thesis (2023) - B.M. Ortal (author), M.A. Zanger (mentor), M.T.J. Spaan (mentor), E. Congeduti (graduation committee member)

This research paper aims to investigate the effect of entropy while training the agent on the robustness of the agent. This is important because robustness is defined as the agent's adaptability to different environments. A self-driving car should adapt to every environment that ...

An empirical analysis of entropy search in batch bayesian optimisation

A comprehensive study of function shape, batch size, noise level, and dimensionality impact on information-theoretic methods

Bachelor thesis (2023) - P.A. Hautelman (author), J.A. de Vries (mentor), M.T.J. Spaan (mentor), C. Lofi (graduation committee member)

Bayesian optimisation is a rapidly growing area of research that aims to identify the optimum of the black-box function, as it strategically directs the optimisation process towards promising regions. This paper provides an overview of the theoretical background used by the Entro ...

Effects of action space discretization and DQN extensions on algorithm robustness and efficiency

How do the discretization of the action space and various extensions to the well-known DQN algorithm influence training and the robustness of final policies under various testing conditions?

Bachelor thesis (2023) - M.A. Sözüdüz (author), M.T.J. Spaan (mentor), M.A. Zanger (mentor), E. Congeduti (graduation committee member)

Reinforcement Learning (RL) has gained atten-tion as a way of creating autonomous agents for self-driving cars. This paper explores the adap- tation of the Deep Q Network (DQN), a popular deep RL algorithm, in the Carla traffic simulator for autonomous driving. It investigates th ...

Comparative Analysis of Exploration Algorithms in Deep Reinforcement Learning for Autonomous Driving

How does epsilon-greedy, random network distillation, bootstrapped DQN affect training and the robustness of final policies under various testing conditions in autonomous driving?

Bachelor thesis (2023) - E. Sozen (author), M.A. Zanger (mentor), M.T.J. Spaan (mentor), E. Congeduti (graduation committee member)

Autonomous driving is a rapidly evolving field that aims to enhance road safety and reduce accidents through the use of advanced software and hardware technologies. Reinforcement learning (RL) combined with deep neural networks has emerged as a promising approach for training aut ...

Evaluating robustness of deep reinforcement learning for autonomous driving

Effects of domain randomization on training and robustness

Bachelor thesis (2023) - E. Bayram (author), M.T.J. Spaan (mentor), M.A. Zanger (mentor), E. Congeduti (graduation committee member)

Deep reinforcement learning has been a topic of research in recent years and has been expanding into the domain of autonomous driving. As autonomous driving is likely to involve people, such as daily commuters, it is necessary to ensure the machine will perform well enough in rea ...

Replacing the acquisition function in Bayesian optimization by a neural network

How effectively do meta-learned acquisition functions in Bayesian optimization perform when optimizing for control variates of unknown functions, as compared to BO with standard acquisition functions

Bachelor thesis (2023) - S. Ramezani (author), M.T.J. Spaan (mentor), J.A. de Vries (mentor), C. Lofi (graduation committee member)

Bayesian Optimization (BO) has demonstrated significant utility across numerous applications. However, due to it being designed as a universal optimizer, its performance can often be suboptimal in specialized environments. To overcome this issue, research has been conducted into ...

Effects of Partial Observability Solver Methods on Training and Final Policies in Autonomous Driver RL

How do different methods for dealing with partial observability in the environment influence training and the robustness of final policies under various testing conditions?

Bachelor thesis (2023) - A.E. Çil (author), M.A. Zanger (mentor), M.T.J. Spaan (mentor), E. Congeduti (graduation committee member)

Autonomous driving is a complex problem that can potentially be solved using artificial intelligence. The complexity stems from the system's need to understand the surroundings and make appropriate decisions. However, there are various challenges in constructing such a sophistica ...

Parallel cost-aware optimization of multidimensional black-box functions

Bachelor thesis (2023) - O. Sihlovec (author), M.T.J. Spaan (mentor), J.A. de Vries (mentor), C. Lofi (graduation committee member)

Scientific problems are often concerned with optimization of control variables of complex systems, for instance hyperparameters of machine learning models. A popular solution for such intractable environments is Bayesian optimization. However, many implementations disregard dynam ...