Sequential targeted optimality as a new criterion for teaching and following in repeated games (original) (raw)
Related papers
Learning to commit in repeated games
Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06, 2006
Learning to converge to an efficient, i.e., Pareto-optimal Nash equilibrium of the repeated game is an open problem in multiagent learning. Our goal is to facilitate the learning of efficient outcomes in repeated plays of incomplete information games when only opponent's actions but not its payoffs are observable. We use a two-stage protocol that allows a player to unilaterally commit to an action, allowing the other player to choose an action knowing the action chosen by the committed player. The motivation behind commitment is to promote trust between the players and prevent them from mutually harmful choices made to preclude worst-case outcomes. Our agents learn whether commitment is beneficial or not. Interestingly, the decision to commit can be thought of as expanding the action space and our proposed protocol can be incorporated by any learning strategies used for playing repeated games. We show the improvement of the outcome efficiency of standard learning algorithms when using our proposed commitment protocol. We propose convergence to pareto optimal Nash equilibrium of repeated games as desirable learning outcomes. The performance evaluation in this paper uses a similarly motivated metric that measures the percentage of Nash equilibria for repated games that dominate the observed outcome.
Learning Cooperation In Repeated Games
1999
Abstract. In the eld of multi-agent systems, the study of coordination, cooperation and collaboration assumes a prominent position. Most of the research concerned with these issues concentrates on explicit negotiation between agents, on the investigation of settings in which global system goals have to be balanced with agents' individual goals or on the exploitation of real-world knowledge to determine e cient coordination strategies.
A learning-based model of repeated games with incomplete information
Games and Economic Behavior, 2006
This paper tests a learning-based model of strategic teaching in repeated games with incomplete information. The repeated game has a long-run player whose type is unknown to a group of shortrun players. The proposed model assumes a fraction of 'short-run' players follow a one-parameter learning model (self-tuning EWA). In addition, some 'long-run' players are myopic while others are sophisticated and rationally anticipate how short-run players adjust their actions over time and "teach" the short-run players to maximize their long-run payoffs. All players optimize noisily. The proposed model nests an agent-based quantal-response equilibrium (AQRE) and the standard equilibrium models as special cases. Using data from 28 experimental sessions of trust and entry repeated games, including 8 previously unpublished sessions, the model fits substantially better than chance and much better than standard equilibrium models. Estimates show that most of the long-run players are sophisticated, and short-run players become more sophisticated with experience.
Multiagent Learning and Optimality Criteria in Repeated Game Self-play
Abstract: We present a multiagent learning approach to satisfy any given optimality criterion in repeated game self-play. Our approach is opposed to classical learning approaches for repeated games: namely, learning of equilibrium, Paretoefficient learning, and their variants. The comparison is given from a practical (or engineering) standpoint, ie, from a point of view of a multiagent system designer whose goal is to maximize the system's overall performance according to a given optimality criterion.
Learning with repeated-game strategies
Frontiers in Neuroscience, 2014
We use the self-tuning Experience Weighted Attraction model with repeated-game strategies as a computer testbed to examine the relative frequency, speed of convergence and progression of a set of repeated-game strategies in four symmetric 2 × 2 games: Prisoner's Dilemma, Battle of the Sexes, Stag-Hunt, and Chicken. In the Prisoner's Dilemma game, we find that the strategy with the most occurrences is the "Grim-Trigger." In the Battle of the Sexes game, a cooperative pair that alternates between the two pure-strategy Nash equilibria emerges as the one with the most occurrences. In the Stag-Hunt and Chicken games, the "Win-Stay, Lose-Shift" and "Grim-Trigger" strategies are the ones with the most occurrences. Overall, the pairs that converged quickly ended up at the cooperative outcomes, whereas the ones that were extremely slow to reach convergence ended up at non-cooperative outcomes.
Learning in games with bounded memory
2006
The paper studies infinite repetition of finite strate- gic form games. Players use a backward looking learning behavior and face bounds on their cognitive capacities. We show that for any given belief-probability over the set of possible outcomes where players have no experience, games can be payoff classified and there always exists a stationary state in the space of action profiles. In particular, if the belief-probability assumes all possible outcomes without experience to be equally likely, in one class of Prisoners' Dilemmas where the uniformly weighted average defecting payoff is higher than the cooperative payoff and the uniformly weighted average cooperative payoff is lower than the defecting payoff, play converges in the long run to the static Nash equilibrium while in the other class of Prisoners' Dilemmas where the reverse holds, play converges to cooperation. Results are applied to a large class of 2× 2 games.
2006
: We consider the problem of learning strategy selection in games. The theoretical solution to this problem is a distribution over strategies that responds to a Nash equilibrium of the game. When the payoff function of the game is not known to the participants, such a ...
New criteria and a new algorithm for learning in multi-agent systems
… in neural information processing systems, 2005
We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm's payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms.
Repeated games for multiagent systems: a survey
The Knowledge Engineering Review, 2013
Repeated games are an important mathematical formalism to model and study long-term economic interactions between multiple self-interested parties (individuals or groups of individuals). They open attractive perspectives in modeling long-term multiagent interactions. This overview paper discusses the most important results that actually exist for repeated games. These results arise from both economics and computer science. Contrary to a number of existing surveys of repeated games, most of which originated from the economic research community, we are first to pay a special attention to a number of important distinctive features proper to artificial agents. More precisely, artificial agents, as opposed to the human agents mainly aimed by the economic research, are usually bounded whether in terms of memory or performance. Therefore, their decisions have to be based on the strategies defined using finite representations. Furthermore, these strategies have to be efficiently computed or...