Bei Peng | TU Delft Repository

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Conference paper (2021) - Tarun Gupta (author) , Anuj Mahajan (author) , Bei Peng (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can pre ...

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

Conference paper (2021) - Shariq Iqbal (author) , Christian A. Schroeder de Witt (author) , Bei Peng (author) , Wendelin Böhmer (author) , Shimon Whiteson (author) , Fei Sha (author)

Real world multi-agent tasks often involve varying types and quantities of agents and non-agent entities; however, agents within these tasks rarely need to consider all others at all times in order to act effectively. Factored value function approaches have historically leveraged ...

FACMAC

Factored Multi-Agent Centralised Policy Gradients

Conference paper (2021) - Bei Peng (author) , Tabish Rashid (author) , Christian A. Schroeder de Witt (author) , Pierre-Alexandre Kamienny (author) , Philip H.S. Torr (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic polic ...

Optimistic Exploration even with a Pessimistic Initialisation

Conference paper (2020) - Tabish Rashid (author) , Bei Peng (author) , J.W. Böhmer (author) , Shimon Whiteson (author)