A Learning Framework for Distribution-Based Game-Theoretic Solution Concepts (original) (raw)

Learning payoff functions in infinite games

Machine Learning, 2007

We consider a class of games with real-valued strategies and payoff information available only in the form of data from a given sample of strategy profiles. Solving such games with respect to the underlying strategy space requires generalizing from the data to a complete payoff-function representation. We address payoff-function learning as a standard regression problem, with provision for capturing known structure (symmetry) in the multiagent environment. To measure learning performance, we consider the relative utility of prescribed strategies, rather than the accuracy of payoff functions per se. We demonstrate our approach and evaluate its effectiveness on two examples: a two-player version of the first-price sealed-bid auction (with known analytical form), and a five-player marketbased scheduling game (with no known solution).

Efficient learning in games

2006

: We consider the problem of learning strategy selection in games. The theoretical solution to this problem is a distribution over strategies that responds to a Nash equilibrium of the game. When the payoff function of the game is not known to the participants, such a ...

The Price is (Probably) Right: Learning Market Equilibria from Samples

2021

Equilibrium computation in markets usually considers settings where player valuation functions are known. We consider the setting where player valuations are unknown; using a PAC learningtheoretic framework, we analyze some classes of common valuation functions, and provide algorithms which output direct PAC equilibrium allocations, not estimates based on attempting to learn valuation functions. Since there exist trivial PAC market outcomes with an unbounded worst-case efficiency loss, we lower-bound the efficiency of our algorithms. While the efficiency loss under general distributions is rather high, we show that in some cases (e.g., unit-demand valuations), it is possible to find a PAC market equilibrium with significantly better utility.

Convergence to Pareto Optimality in General Sum Games Via Learning Opponent Preferences

2005

We consider the learning problem faced by two self-interested agents playing any general-sum game repeatedly where the opponent payoff is unknown. The concept of Nash Equilibrium in repeated games provides us an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. However, such a strategy can sometimes lead to a Pareto-dominated outcome for the repeated game. Our goal is to design learning strategies that converge to a Pareto-efficient outcome that also produces a Nash Equilibrium payoff for repeated two player n-action general-sum games. We present a learning algorithm, POSNEL, which learns opponent's preference structure and produces, under self-play, Nash equilibrium payoffs in the limit in all such games. We also show that such learning will generate Pareto-optimal payoffs in a large majority of games. We derive a probability bound for convergence to Nash Equilibrium payoff and experimentally demonstrate convergence to Pareto optimality for all structurally distinct 2-player 2-action conflict games. We also compare our algorithm with existing algorithms such as WOLF-IGA and JAL and showed that POSNEL on average, outperforms both the algorithms.

Convergence to Pareto Optimality in General Sum Games via Learning Opponent's Preference

We consider the learning problem faced by two self-interested agents playing any general-sum game repeatedly where the opponent payoff is unknown. The concept of Nash Equilibrium in repeated games provides us an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. However, such a strategy can sometimes lead to a Pareto-dominated outcome for the repeated game. Our goal is to design learning strategies that converge to a Pareto-efficient outcome that also produces a Nash Equilibrium payoff for repeated two player n-action general-sum games. We present a learning algorithm, POSNEL, which learns opponent's preference structure and produces, under self-play, Nash equilibrium payoffs in the limit in all such games. We also show that such learning will generate Pareto-optimal payoffs in a large majority of games. We derive a probability bound for convergence to Nash Equilibrium payoff and experimentally demonstrate convergence to Pareto optimality for all structurally distinct 2-player 2-action conflict games. We also compare our algorithm with existing algorithms such as WOLF-IGA and JAL and showed that POSNEL on average, outperforms both the algorithms.

Learning in Non-convex Games with an Optimization Oracle

Cornell University - arXiv, 2018

We consider online learning in an adversarial, non-convex setting under the assumption that the learner has an access to an offline optimization oracle. In the general setting of prediction with expert advice, [11] established that in the optimization-oracle model, online learning requires exponentially more computation than statistical learning. In this paper we show that by slightly strengthening the oracle model, the online and the statistical learning models become computationally equivalent. Our result holds for any Lipschitz and bounded (but not necessarily convex) function. As an application we demonstrate how the offline oracle enables efficient computation of an equilibrium in non-convex games, that include GAN (generative adversarial networks) as a special case.

Learning with minimal information in continuous games

Theoretical Economics, 2020

While payoff‐based learning models are almost exclusively devised for finite action games, where players can test every action, it is harder to design such learning processes for continuous games. We construct a stochastic learning rule, designed for games with continuous action sets, which requires no sophistication from the players and is simple to implement: players update their actions according to variations in own payoff between current and previous action. We then analyze its behavior in several classes of continuous games and show that convergence to a stable Nash equilibrium is guaranteed in all games with strategic complements as well as in concave games, while convergence to Nash equilibrium occurs in all locally ordinal potential games as soon as Nash equilibria are isolated.

Convergent learning algorithms for potential games with unknown noisy rewards

2011

In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and which must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned on-line. Standard results in game theory, however, do not consider such settings. Specifically, using results from stochastic approximation and differential inclusions, we prove the convergence of variants of fictitious play and adaptive play to Nash equilibria in potential games and weakly acyclic games, respectively. These variants all use a multi-agent version of Q-learning to estimate the reward functions and a novel form of the e-greedy decision rule to select an action. Furthermore, we derive e-greedy decision rules that exploit the sparse interaction structure encoded in two compact graphical representations of games, known as graphical and hypergraphical normal form, to improve the convergence rate of the learning algorithms. The structure captured in these representations naturally occurs in many distributed optimisation and control applications. Finally, we demonstrate the efficacy of the algorithms in a simulated ad hoc wireless sensor network management problem.

Learning and Efficiency in Games with Dynamic Population

Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2015

We study the quality of outcomes in repeated games when the population of players is dynamically changing and participants use learning algorithms to adapt to the changing environment. Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly, and in such games players often use algorithmic tools to learn to play in the given environment. Learning in repeated games has only been studied when the population playing the game is stable over time.

Distributionally Robust Games: f-Divergence and Learning

Proceedings of the 11th EAI International Conference on Performance Evaluation Methodologies and Tools, 2017

In this paper we introduce the novel framework of distributionally robust games. These are multi-player games where each player models the state of nature using a worst-case distribution, also called adversarial distribution. Thus each player's payoff depends on the other players' decisions and on the decision of a virtual player (nature) who selects an adversarial distribution of scenarios. This paper provides three main contributions. Firstly, the distributionally robust game is formulated using the statistical notions of f-divergence between two distributions, here represented by the adversarial distribution, and the exact distribution. Secondly, the complexity of the problem is significantly reduced by means of triality theory. Thirdly, stochastic Bregman learning algorithms are proposed to speedup the computation of robust equilibria. Finally, the theoretical findings are illustrated in a convex setting and its limitations are tested with a non-convex non-concave function.