The cross-entropy method for policy search in decentralized POMDPs

Oliehoek, Frans; Kooij, Julian; Vlassis, Nikos

The cross-entropy method for policy search in decentralized POMDPs

Journal article (2008)

Authors

Frans Oliehoek Universiteit van Amsterdam

Julian Kooij Universiteit van Amsterdam

Nikos Vlassis Technical University of Crete

Affiliation

External organisation

Combinatorial optimization Decentralized POMDPs Multiagent planning

To reference this document use:

http://resolver.tudelft.nl/uuid:5bcd2e92-6347-49ea-a5c4-76489c2a4e68

More Info

expand_more

Published Date

2008

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Affiliation

External organisation

Abstract

Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.