An improved lower bound for Order-Regular matrices (original) (raw)

Improved bound on the worst case complexity of Policy Iteration

Solving Markov Decision Processes (MDPs) is a recurrent task in engineering. Even though it is known that solutions for minimizing the infinite horizon expected reward can be found in polynomial time using Linear Programming techniques, iterative methods like the Policy Iteration algorithm (PI) remain usually the most efficient in practice. This method is guaranteed to converge in a finite number of steps. Unfortunately, it is known that it may require an exponential number of steps in the size of the problem to converge. On the other hand, many open questions remain considering the actual worst case complexity. In this work, we provide the first improvement over the fifteen years old upper bound from Mansour & Singh (1999) by showing that PI requires at most k/(k-1)*k^n/n + o(k^n/n) iterations to converge, where n is the number of states of the MDP and k is the maximum number of actions per state. Perhaps more importantly, we also show that this bound is optimal for an important re...

Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Journal of the ACM, 2013

Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most O ( mn 1− γ log n 1− γ ) iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 < γ < 1 is the discount factor. We improve Ye’s analysis in two respects. First, we improve the bound given by Ye and show that Howard’s policy iteration algorithm actually terminates after at most O ( m 1− γ log n 1− γ ) iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement ) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provide...

Settling the complexity of computing two-player Nash equilibria

Journal of The ACM, 2009

We prove that Bimatrix, the problem of finding a Nash equilibrium in a two-player game, is complete for the complexity class PPAD (Polynomial Parity Argument, Directed version) introduced by Papadimitriou in 1991. Our result, building upon the work of Daskalakis et al. [2006a] on the complexity of four-player Nash equilibria, settles a long standing open problem in algorithmic game theory.

A Sums-of-Squares Extension of Policy Iterations

In order to address the imprecision often introduced by widening operators, policy iteration based on min-computations amounts to consider the characterization of reachable states of a program as an iterative computation of policies, starting from a post-fixpoint. Computing each policy and the associated invariant relies on a sequence of numerical optimizations. While the early papers rely on LP to address linear properties of linear programs, the current state of the art is still limited to the analysis of linear programs with at most quadratic invariant, relying on Semi-Definite Programming (SDP) solvers to compute the next policy, and LP solvers to solve the selected policy. We propose here to extend the class of programs considered through the use of Sums-of-Squares (SOS) optimizations. Our approach enables the precise analysis of switched systems with polynomial assigns and guards. The analysis presented has been implemented in Matlab and applied on existing programs, improving...

Polynomial Value Iteration Algorithms for Deterministic MDPs

Value iteration is a commonly used and em pirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo polynomial complexity in general. We estab lish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest aver age reward cycle on a DMDP problem in IJ(n2) iterations, or IJ(mn2) total time, where n denotes the number of states, and m the number of edges.

On Policy Iteration as a Newton s Method and Polynomial Policy Iteration Algorithms

Policy iteration is a popular technique for solving Markov decision processes (MDPs). It is easy to describe and implement, and has excellent performance in practice. But not much is known about its complexity. The best upper bound remains exponential, and the best lower bound is a trivial Ω(n) on the number of iterations, where n is the number of states. This paper improves the upper bounds to a polynomial for policy iteration on MDP problems with special graph structure. Our analysis is based on the connection between policy iteration and Newton's method for finding the zero of a convex function. The analysis offers an explanation as to why policy iteration is fast. It also leads to polynomial bounds on several variants of policy iteration for MDPs for which the linear programming formulation requires at most two variables per inequality (MDP(2)). The MDP(2) class includes deterministic MDPs under discounted and average reward criteria. The bounds on the run times include O(mn 2 log m log W ) on MDP(2) and O(mn 2 log m) for deterministic MDPs, where m denotes the number of actions and W denotes the magnitude of the largest number in the problem description.

New results on the complexity of uniformly mixed Nash equilibria

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2005

We are interested in the complexity of finding Nash equilibria with one uniformly mixed strategy (that is, equilibria in which at least one played strategy is a uniform probability function). We show that, even in imitation bimatrix games, where one player has a positive payoff if he plays the same pure strategy as the opponent, deciding the existence of such an equilibrium is an NP-complete problem. This main result follows from the NP-completeness of graph-theoretical problems strictly related to this class of equilibria.

The Complexity of Stochastic Rabin and Streett Games

Lecture Notes in Computer Science, 2005

The theory of graph games with ω-regular winning conditions is the foundation for modeling and synthesizing reactive processes. In the case of stochastic reactive processes, the corresponding stochastic graph games have three players, two of them (System and Environment) behaving adversarially, and the third (Uncertainty) behaving probabilistically. We consider two problems for stochastic graph games: the qualitative problem asks for the set of states from which a player can win with probability 1 (almost-sure winning); the quantitative problem asks for the maximal probability of winning (optimal winning) from each state. We show that for Rabin winning conditions, both problems are in NP. As these problems were known to be NP-hard, it follows that they are NPcomplete for Rabin conditions, and dually, coNP-complete for Streett conditions. The proof proceeds by showing that pure memoryless strategies suffice for qualitatively and quantitatively winning stochastic graph games with Rabin conditions. This insight is of interest in its own right, as it implies that controllers for Rabin objectives have simple implementations. We also prove that for every ω-regular condition, optimal winning strategies are no more complex than almost-sure winning strategies.

The complexity of computing a Nash equilibrium

Communications of the ACM, 2009

How long does it take until economic agents converge to an equilibrium? By studying the complexity of the problem of computing a mixed Nash equilibrium in a game, we provide evidence that there are games in which convergence to such an equilibrium takes prohibitively long. Traditionally, computational problems fall into two classes: those that have a polynomial-time algorithm and those that are NP-hard. However, the concept of NP-hardness cannot be applied to the rare problems where "every instance has a solution"---for example, in the case of games Nash's theorem asserts that every game has a mixed equilibrium (now known as the Nash equilibrium, in honor of that result). We show that finding a Nash equilibrium is complete for a class of problems called PPAD, containing several other known hard problems; all problems in PPAD share the same style of proof that every instance has a solution.