MAMBPO

Willemsen, Daniel; Coppola, Mario; de Croon, Guido C.H.E.de

MAMBPO

Sample-efficient multi-robot reinforcement learning using learned world models

Conference paper (2021)

Authors

Daniel Willemsen Student

Mario Coppola Control & Simulation

Guido C.H.E.de de Croon Control & Simulation

Research Group

Control & Simulation

To reference this document use:

http://resolver.tudelft.nl/uuid:d3f8712a-7068-4f63-812d-b25c62c65604

More Info

expand_more

Published Date

2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Control & Simulation

Abstract

Multi-robot systems can benefit from reinforcement learning (RL) algorithms that learn behaviours in a small number of trials, a property known as sample efficiency. This research thus investigates the use of learned world models to improve sample efficiency. We present a novel multi-agent model-based RL algorithm: Multi-Agent Model-Based Policy Optimization (MAMBPO), utilizing the Centralized Learning for Decentralized Execution (CLDE) framework. CLDE algorithms allow a group of agents to act in a fully decentralized manner after training. This is a desirable property for many systems comprising of multiple robots. MAMBPO uses a learned world model to improve sample efficiency compared to model-free Multi-Agent Soft Actor-Critic (MASAC). We demonstrate this on two simulated multi-robot tasks, where MAMBPO achieves a similar performance to MASAC, but requires far fewer samples to do so. Through this, we take an important step towards making real-life learning for multi-robot systems possible.

Files

MAMBPO_Sample_efficient_multi_... (pdf)

(pdf | 1.29 Mb)

License info not available

Download not available