Alternating Maximization with Behavioral Cloning

Czechowski, A.T.; Oliehoek, F.A.

Alternating Maximization with Behavioral Cloning

Conference paper (2020)

Authors

A.T. Czechowski Interactive Intelligence -

F.A. Oliehoek Interactive Intelligence -

Research Group

Interactive Intelligence () (TU Delft)

To reference this document use:

http://resolver.tudelft.nl/uuid:8b2e3368-bfbd-495e-9f00-c7419d75e60b

More Info

expand_more

Published Date

2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Intelligent Systems

Research Group

Interactive Intelligence

Abstract

The key difficulty of cooperative, decentralized planning lies in making accurate predictions about the behavior of one’s teammates. In this paper we introduce a planning method of Alternating maximization with Behavioural Cloning (ABC) – a trainable online decentralized planning algorithm based on Monte Carlo Tree Search (MCTS), combined with models of teammates learned from previous episodic runs. Our algorithm relies on the idea of alternating maximization, where agents adapt their models one at a time in round-robin manner. Under the assumption of perfect policy cloning, and with a sufficient amount of Monte Carlo samples, successive iterations of our method are guaranteed to improve joint policies, and eventually converge.

Files

Bnaic2020proceedings02.pdf

(pdf | 0.622 Mb)

Unknown license