L. Cavalcante Siebert | TU Delft Repository

Uncovering Sequential Social Dilemmas in Multi-Agent Reinforcement Learning

Challenges and Strategies for Local Energy Communities

Master thesis (2025) - M.T. Okoń (author) , Luciano Cavalcante Siebert (mentor) , Jochen L. Cremer (mentor) , J. Yang (graduation committee member)

This thesis investigates the occurrence and mitigation of Sequential Social Dilemmas (SSDs) in Local Energy Communities (LECs) managed through Multi-agent Reinforcement Learning (MARL). LECs have great potential as pivotal elements in the green energy transition, yet the inherent ...

Interactive Reinforcement Learning for Adaptive Thermal Comfort

Master thesis (2024) - A. Korkusuz (author) , Luciano Siebert (mentor) , P. Rutgers (mentor)

Designing and implementing effective systems for thermal comfort management in buildings is a complex task due to the need to account for subjective preference parameters influenced by human physiology, bias and tendencies. This research introduces a novel approach to simulating ...

Decreasing the number of demonstrations required for Inverse Reinforcement Learning by integrating human feedback

Bachelor thesis (2024) - Z. Oğurlu (author) , Luciano Cavalcante Siebert (mentor) , A. Mone (mentor) , Wendelin Böhmer (graduation committee member)

The main concept behind reinforcement learning is that an agent takes certain actions and is rewarded or punished for these actions. However, the rewards that are involved when performing a certain task can be quite complicated in real life and the contribution of different facto ...

The main concept behind reinforcement learning is that an agent takes certain actions and is rewarded or punished for these actions. However, the rewards that are involved when performing a certain task can be quite complicated in real life and the contribution of different factors in the reward function is often unknown. From this problem emerges reward learning, which is the process of learning the reward function of an environment. There are several techniques for performing reward learning. We can view these different techniques within 2 different high-level categories: Learning from demonstrations and learning from feedback. IRL (Inverse Reinforcement Learning) is a way of learning from demonstrations. Meanwhile, RLHF (Reinforcement Learning from Human Feedback) is a way of learning from feedback.

In this paper, we are proposing the approach of training a reward learning agent, first with IRL and then with RLHF. IRL provides the benefit of learning a reward function quite quickly, however, it can suffer from the presence of sub-optimal demonstrations from the expert. Meanwhile, RLHF is slower at learning the reward function from scratch. Hence, we are proposing an approach where we integrate RLHF as a way to fine-tune the initial reward function calculated by IRL. By doing so, we are aiming to alleviate the negative effect of sub-optimal expert demonstrations on IRL.

We test and evaluate our methodology on the cart pole environment from the seals library. We compare the results from our approach to reward learning from only expert demonstrations, without integrating human feedback (i.e. only IRL). The obtained results suggest that, RLHF might in fact not be a good complement for IRL, specifically when we have sub-optimal expert demonstrations. In fact, we found that applying RLHF on top of IRL can even drop the performance of the resulting reward function, which challenges our initial hypothesis regarding the complementarity between these two methods.

The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback

How can RLHF deal with possibly conflicting feedback?

Bachelor thesis (2024) - J. PAEZ FRANCO (author) , A. Mone (mentor) , Luciano Cavalcante Siebert (mentor) , Wendelin Böhmer (graduation committee member)

Reinforcement Learning from Human Feedback (RLHF) is a promising approach to training agents to perform complex tasks by incorporating human feedback. However, the quality and diversity of this feedback can significantly impact the learning process. Humans are highly diverse in t ...

Decoding Sentiment with Large Language Models

Comparing Prompting Strategies Across Hard, Soft, and Subjective Label Scenarios

Bachelor thesis (2024) - T. Oberhuber (author) , Luciano Cavalcante Cavalcante Siebert (mentor) , Amir Homayounirad (mentor) , E. Liscio (mentor) , Jie Yang (graduation committee member)

This study evaluates the performance of different sentiment analysis methods in the context of public deliberation, focusing on hard-, soft-, and subjective-label scenarios to answer the research question: ``can a Large Language Model detect subjective sentiment of statements wit ...

The Role of Feedback Variety in Reinforcement Learning from Human Feedback

Bachelor thesis (2024) - I. Makarov (author) , Luciano C. Siebert (mentor) , A. Mone (mentor) , J.W. Böhmer (graduation committee member)

Reinforcement Learning from Human Feedback (RLHF) offers a powerful approach to training agents in environments where defining an explicit reward function is challenging by learning from human feedback provided in various forms. This research evaluates three common feedback types ...

Leveraging Large Language Models for Classifying Subjective Arguments in Public Discourse

Bachelor thesis (2024) - A. Dobrinoiu (author) , Luciano Cavalcante Cavalcante Siebert (mentor) , A. Homayounirad (mentor) , E. Liscio (mentor) , Jie Yang (graduation committee member)

This study investigates the effectiveness of Large Language Models (LLMs) in identifying and classifying subjective arguments within deliberative discourse. Using data from a Participatory Value Evaluation (PVE) conducted in the Netherlands, this research introduces an annotation ...

Exploring the Synergy between Inverse Reinforcement Learning and Reinforcement Learning From Human Feedback for Query Reduction

Bachelor thesis (2024) - A. Batrineanu (author) , Luciano Cavalcante Siebert (mentor) , A. Mone (mentor) , Wendelin Böhmer (graduation committee member)

Reinforcement Learning is a powerful tool for problems that require sequential-decision-making. However, it often faces challenges due to the extensive need for reward engineering. Reinforcement Learning from Human Feedback (RLHF) and Inverse Reinforcement Learning (IRL) hold the ...

Using Large Language Models to Detect Deliberative Elements in Public Discourse

Detecting Subjective Emotions in Public Discourse

Bachelor thesis (2024) - B.C.P. Zuurbier (author) , Luciano Cavalcante Cavalcante Siebert (mentor) , A. Homayounirad (mentor) , E. Liscio (mentor) , Jie Yang (graduation committee member)

In order to tackle topics such as climate change together with the population, public discourse should be scaled up. This discourse should be mediated as it makes it more likely that people understand each other and change their point of view. To help the mediator with this task, ...

Leveraging LLMs for subjective value detection in argument statements

Bachelor thesis (2024) - J.C.E. Gorter (author) , Luciano Cavalcante Cavalcante Siebert (mentor) , A. Homayounirad (mentor) , E. Liscio (mentor)

This paper investigates the use of Large Language Models (LLMs) for automatic detection of subjective values in argument statements in public discourse. Understanding the underlying values of argument statements could enhance public discussions and potentially lead to better outc ...

Leveraging LLMs for Classifying Subjective Topics Behind Public Discourse

Bachelor thesis (2024) - A. Marcu (author) , Luciano Cavalcante Cavalcante Siebert (mentor) , A. Homayounirad (mentor) , E. Liscio (mentor) , Jie Yang (graduation committee member)

Public deliberations play a crucial role in democratic systems. However, the unstructured nature of deliberations leads to challenges for moderators to analyze the large volume of data produced. This paper aims to solve this challenge by automatically identifying subjective topic ...

Conflict in the World of Inverse Reinforcement Learning

Investigating Inverse Reinforcement Learning with Conflicting Demonstrations

Bachelor thesis (2024) - P. Koev (author) , A. Mone (mentor) , Luciano Cavalcante Siebert (mentor) , Wendelin Böhmer (graduation committee member)

Inverse Reinforcement Learning (IRL) algorithms are closely related to Reinforcement Learning (RL) but instead try to model the reward function from a given set of expert demonstrations. In IRL, many algorithms have been proposed, but most assume consistent demonstrations. Consis ...

Detecting Long-term Behavioral Adaptations in Assisted Driving

An Automated Approach Using Neural Networks and Novelty Detection

Master thesis (2024) - R.G. Oude Elferink (author) , Luciano Siebert (mentor) , A. Lukina (mentor) , C.A. Raman (graduation committee member)

The autonomous vehicle industry has the potential to revolutionize the future of driving, making the understanding of vehicle-driver interactions crucial as we progress towards fully autonomous systems. Advanced Driver Assistance Systems (ADAS) are integral in this evolution, bri ...

Multi-expert Preference Alignment in Reinforcement Learning

Master thesis (2024) - L. Li (author) , Luciano Cavalcante Siebert (mentor)

This project explores adaptation to preference shifts in Multi-objective Reinforcement Learning (MORL), with a focus on how Reinforcement Learning (RL) agents can align with the preferences of multiple experts. This alignment can occur across various scenarios featuring distinct ...

Preference-Based Reinforcement Learninig in Demand Response Programs

Master thesis (2024) - P. Piccini (author) , Luciano Cavalcante Siebert (mentor)

ncentive-based demand response (iDR) programs serve as important tools for distributed system operators (DSOs) to achieve a reduction in electricity demand during periods of grid overload. During these programs, participants can decide to curtail their consumption in exchange for ...

Inverse Reinforcement Learning (IRL) in Presence of Risk and Uncertainty Related Cognitive Biases

To what extent can IRL learn rewards from expert demonstrations with loss and risk aversion?

Bachelor thesis (2023) - M. Ikiz (author) , A. Caregnato Neto (mentor) , Luciano Cavalcante Siebert (mentor) , J. Weber (graduation committee member)

A key issue in Reinforcement Learning (RL) research is the difficulty of defining rewards. Inverse Reinforcement Learning (IRL) is a technique that addresses this challenge by learning the rewards from expert demonstrations. In a realistic setting, expert demonstrations are colle ...

Conflicting demonstrations in Inverse Reinforcement Learning

Bachelor thesis (2023) - R.M. Labbé (author) , Luciano Siebert (mentor) , A. Caregnato Neto (mentor) , Jana M. Weber (graduation committee member)

This paper aims to investigate the effect of conflicting demonstrations on Inverse Reinforcement Learning (IRL). IRL is a method to understand the intent of an expert, by only feeding it demonstrations of that expert, which may be a promising approach for areas such as self drivi ...

What are the implications of Curriculum Learning strategy on IRL methods?

Investigating Inverse Reinforcement Learning from Human Behavior

Bachelor thesis (2023) - M. Vlasenko (author) , Luciano Cavalcante Siebert (mentor) , A. Caregnato Neto (mentor) , J. Weber (graduation committee member)

Inverse Reinforcement Learning (IRL) is a subfield of Reinforcement Learning (RL) that focuses on recovering the reward function using expert demonstrations. In the field of IRL, Adversarial IRL (AIRL) is a promising algorithm that is postulated to recover non-linear rewards in e ...

Investigating the extent to which inverse reinforcement learning can learn Rrewards from noisy demonstrations

Bachelor thesis (2023) - C. Perdikis (author) , Luciano Cavalcante Siebert (mentor) , A. Caregnato Neto (mentor) , J. Weber (graduation committee member)

Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations in a Markov Decision Process (MDP). The objective is to understand the underlying intentions and behaviors of experts and derive a reward function based on their reasoning, rather th ...

Aggregation and Prediction of Energy Consumption Data

What is the Aggregatino Level at which a Graph Neural Network Performs Optimally?

Bachelor thesis (2023) - L.J.K. Timp (author) , Luciano Siebert (mentor) , S.K. Kuilman (mentor) , Mathijs Weerdt (graduation committee member)

Electrical load forecasting, namely short-term load forecasting, is essential to power grids’ safe and efficient operations. The need for accurate short-term load forecasting becomes increasingly pressing with increased renewable energy sources, which are stochastic in their powe ...