A Finite Horizon DEC-POMDP Approach to Multi-robot Task Learning (original) (raw)

Decision making under uncertainty is one of the key problems of robotics and this problem is even harder in the multi-agent domain. Decentralized Partially Observable Markov Decision Process (DEC-POMDP) is an approach to model multiagent decision making problems under uncertainty. There is no efficient exact algorithm to solve these problems since the worst case complexity of the general case has been shown to be NEXP-complete. This paper demonstrates the application of our proposed approximate solution algorithm, which uses evolution strategies, to various DEC-POMDP problems. We show that high level policies can be learned using simplified simulated environments which can readily be transferred to real robots despite having different observation and transition models in the training and the application domains.