On the Agenda(s) of Research on Multi-Agent Learning (original) (raw)

Multi-agent reinforcement learning: a critical survey

… on Artificial Multi-Agent Learning, 2004

We survey the recent work in AI on multi-agent reinforcement learning (that is, learning in stochastic games). We then argue that, while exciting, this work is flawed. The fundamental flaw is unclarity about the problem or problems being addressed. After tracing a representative sample of the recent literature, we identify four well-defined problems in multi-agent reinforcement learning, single out the problem that in our view is most suitable for AI, and make some remarks about how we believe progress is to be made on this problem.

A general criterion and an algorithmic framework for learning in multi-agent systems

Machine Learning, 2007

We offer a new formal criterion for agent-centric learning in multi-agent systems, that is, learning that maximizes one's rewards in the presence of other agents who might also be learning (using the same or other learning algorithms). This new criterion takes in as a parameter the class of opponents. We then provide a modular approach for achieving effective agent-centric learning; the approach consists of a number of basic algorithmic building blocks, which can be instantiated and composed differently depending on the environment setting (for example, 2-versus n-player games) as well as the target class of opponents. We then provide several specific instances of the approach: an algorithm for stationary opponents, and two algorithms for adaptive opponents with bounded memory, one algorithm for the n-player case and another optimized for the 2-player case. We prove our algorithms correct with respect to the formal criterion, and furthermore show the algorithms to be experimentally effective via comprehensive computer testing.

New criteria and a new algorithm for learning in multi-agent systems

… in neural information processing systems, 2005

We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm's payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms.

An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

2000

Learning behaviors in a multiagent environment is crucial for developing and adapting multiagent systems. Reinforce- ment learning techniques have addressed this problem for a single agent acting in a stationary environment, which is modeled as a Markov decision process (MDP). But, multiagent environments are inherently non-stationary since the other agents are free to change their behavior as they also learn

Multi-agent learning dynamics: A survey

2007

In this paper we compare state-of-the-art multi-agent reinforcement learning algorithms in a wide variety of games. We consider two types of algorithms: value iteration and policy iteration. Four characteristics are studied: initial conditions, parameter settings, convergence speed, and local versus global convergence. Global convergence is still difficult to achieve in practice, despite existing theoretical guarantees. Multiple visualizations are included to provide a comprehensive insight into the learning dynamics.

A Comprehensive Survey of Multiagent Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, 2008

Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim-either explicitly or implicitly-at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.

RESQ-learning in stochastic games

2010

ABSTRACT This paper introduces a new multi-agent learning algorithm for stochastic games based on replicator dynamics from evolutionary game theory. We identify and transfer desired convergence behavior of these dynamical systems by leveraging the link between evolutionary game theory and multiagent reinforcement learning.

If multi-agent learning is the answer, what is the question?

Artificial Intelligence, 2007

The area of learning in multi-agent systems is today one of the most fertile grounds for interaction between game theory and artificial intelligence. We focus on the foundational questions in this interdisciplinary area, and identify several distinct agendas that ought to, we argue, be separated. The goal of this article is to start a discussion in the research community that will result in firmer foundations for the area. 1 ]. Over time it has gradually evolved into the current form, as a result of our own work in the area as well as the feedback of many colleagues. We thank them all collectively, with special thanks to members of the multi-agent group at Stanford in the past three years. Rakesh Vohra and Michael Wellman provided detailed comments on the latest draft which resulted in substantive improvements, although we alone are responsible for the views put forward. This work was supported by NSF ITR grant IIS-0205633 and DARPA grant HR0011-05-1.

Satisficing Multi-Agent Learning: A Simple But Powerful Algorithm

2013

Learning in the presence of adaptive, possibly antagonistic, agents presents special challenges to algorithm designers, especially in environments with limited information. We consider situations in which an agent knows its own set of actions and observes its own payoffs, but does not know or observe the actions and payoffs of the other agents. Despite this limited information, a robust learning algorithm must have two properties: security, which requires the algorithm to avoid exploitation by antagonistic agents, and efficiency, which requires the algorithm to find nearly pareto efficient solutions when associating with agents who are inclined to cooperate. However, no learning algorithm in the literature has both of these properties when playing repeated general-sum games in these limited-information environments. In this paper, we present and analyze a variation of Karandikar et al.'s learning algorithm [19]. The algorithm is conceptually very simple, but has surprising power given this simplicity. It is provably secure in all matrix games, regardless of the play of its associates, and it is efficient in self play in a very large set of matrix games. Additionally, the algorithm performs well when associating with representative, state-of-the-art learning algorithms with similar representational capabilities in general-sum games. These properties make the algorithm highly robust, more so than representative best-response and regret-minimizing algorithms with similar reasoning capabilities.

Convergence, targeted optimality, and safety in multiagent learning

2010

ABSTRACT This paper introduces a novel multiagent learning algorithm which achieves convergence, targeted optimality against memory bounded adversaries, and safety, in arbitrary repeated games. Called CMLeS, its most novel aspect is the manner in which it guarantees (in a PAC sense) targeted optimality against memory-bounded adversaries, via efficient exploration and exploitation. CMLeS is fully implemented and we present empirical results demonstrating its effectiveness.