Anytime planning for decentralized POMDPs using expectation maximization (original) (raw)

Planning with Macro-Actions in Decentralized POMDPs

INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions which may require different amounts of time to execute. We model macro-actions as \emph{options} in a factored Dec-POMDP model, focusing on options which depend only on information available to an individual agent while executing. This enables us to model systems where coordination decisions only occur at the level of deciding which macro-actions to execute, and the macro-actions themselves can then be executed to completion. The core technical difficulty when using options in a Dec-POMDP is that the options chosen by the agents no longer terminate at the same time. We present extensions of two leading Dec-POMDP algorithms for generating a policy with options and discuss the resulting form of optimality. Our results show that these algorithms retain agent coordination while allowing near-optimal solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods.

Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs

2008

Decentralized POMDPs (DEC-POMDPs) offer a rich model for planning under uncertainty in multiagent settings. Improving the scalability of solution techniques is an important challenge. While an optimal algorithm has been developed for infinitehorizon DEC-POMDPs, it often requires an intractable amount of time and memory. To address this problem, we present a heuristic version of this algorithm. Our approach is able to use initial state information to decrease solution size and often increases solution quality over what is achievable by the optimal algorithm before resources are exhausted. Experimental results demonstrate that this heuristic approach is effective, producing higher values and more concise solutions in all three test domains.

Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs

2013

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA * search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node’s depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and th...

Modeling and Planning with Macro-Actions in Decentralized POMDPs

Journal of Artificial Intelligence Research, 2019

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both sta...

Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms

HAL (Le Centre pour la Communication Scientifique Directe), 2014

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in cooperative decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be distributed. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. When the curse of dimensionality becomes too prohibitive, we refine this basic approach and present ways to combine heuristic search and compact representations that exploit the structure present in multiagent domains, without losing the ability to eventually converge to an optimal solution. In particular, we introduce feature-based heuristic search that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that our feature-based heuristic search algorithms terminate in finite time with an optimal solution. We include an extensive empirical analysis using well known benchmarks, thereby demonstrating our approach provides significant scalability improvements compared to the state of the art.

Towards Computing Optimal Policies for Decentralized POMDPs

2002

The problem of deriving joint policies for a group of agents that maximze some joint reward function can be modelled as a decentralized partially observable Markov decision process (DEC-POMDP). Significant algorithms have been developed for single agent POMDPs however, with a few exceptions, effective algorithms for deriving policies for decentralized POMDPS have not been developed. As a first step, we present new algorithms for solving decentralized POMDPs. In particular, we describe an exhaustive search algorithm for a globally optimal solution and analyze the complexity of this algorithm, which we find to be doubly exponential in the number of agents and time, highlighting the importance of more feasible approximations. We define a class of algorithms which we refer to as "Joint Equilibrium-based Search for Policies"(JESP) and describe an exhaustive algorithm and a dynamic programming algorithm for JESP. Finaly, we empirically compare the exhaustive JESP algorithm with the globally optimal exhaustive algorithm.

Point-based policy generation for decentralized POMDPs

2010

Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality.

Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Autonomous Agents and Multi-Agent Systems, 2010

POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finite-state controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DEC-POMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an off-the-shelf optimization method are competitive with stateof-the-art POMDP algorithms and outperform state-of-the-art DEC-POMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DEC-POMDPs using nonlinear programming methods.

Incremental clustering and expansion for faster optimal planning in Dec-POMDPs

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A* ( GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA* search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node's depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger Dec-POMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-POMDPs of unprecedented size.

Constraint-based dynamic programming for decentralized POMDPs with structured interactions

2009

Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings.