Event-Based Communication in Distributed Q-Learning

Jarne Ornia, D.; Mazo, M.

doi:10.1109/CDC51059.2022.9992660

Event-Based Communication in Distributed Q-Learning

Conference paper (2022)

Authors

D. Jarne Ornia Team Manuel Mazo Jr - Mechanical, Maritime and Materials Engineering

M. Mazo Team Manuel Mazo Jr - Mechanical, Maritime and Materials Engineering

Research Group

Team Manuel Mazo Jr (Mechanical, Maritime and Materials Engineering) (TU Delft)

DOI: https://doi.org/10.1109/CDC51059.2022.9992660

Reinforcement Learning Convergence Distributed Systems Markov processes Multi-agent systems Control systems Trajectory Event-Triggered Control Q-learning

To reference this document use:

http://resolver.tudelft.nl/uuid:1a7b9b77-2240-4e0c-b392-a951ab1d9451

More Info

expand_more

Published Date

2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical, Maritime and Materials Engineering

Department

Delft Center for Systems and Control

Research Group

Team Manuel Mazo Jr

Abstract

We present an approach to reduce the communication of information needed on a Distributed Q-Learning system inspired by Event Triggered Control (ETC) techniques. We consider a baseline scenario of a Distributed Q-Learning problem on a Markov Decision Process (MDP). Following an event-based approach, N agents sharing a value function explore the MDP and compute a trajectory-dependent triggering signal which they use distributedly to decide when to communicate information to a central learner in charge of computing updates on the action-value function. These decision functions form an Event Based distributed Q learning system (EBd-Q), and we derive convergence guarantees resulting from the reduction of communication. We then apply the proposed algorithm to a cooperative path planning problem, and show how the agents are able to learn optimal trajectories communicating a fraction of the information. Additionally, we discuss what effects (desired and undesired) these event-based approaches have on the learning processes studied, and how they can be applied to more complex multi-agent systems.

Files

Event_Based_Communication_in_D... (pdf)

(pdf | 1.26 Mb)

- Embargo expired in 10-07-2023

Unknown license