Opponent Modeling Research Papers - Academia.edu (original) (raw)
Researching into the incomplete information games (IIG) field requires the development of strategies which focus on optimizing the decision making process, as there is no unequivocal best choice for a particular play. As such, this paper... more
Researching into the incomplete information games (IIG) field requires the development of strategies which focus on optimizing the decision making process, as there is no unequivocal best choice for a particular play. As such, this paper describes the development process and testing of an agent able to compete against human players on Pokerone of the most popular IIG. The used methodology combines pre-defined opponent models with a reinforcement learning approach. The decision-making algorithm creates a different strategy against each type of opponent by identifying the opponent's type and adjusting the rewards of the actions of the corresponding strategy. The opponent models are simple classifications used by Poker experts. Thus, each strategy is constantly adapted throughout the games, continuously improving the agent's performance. In light of this, two agents with the same structure but different rewarding conditions were developed and tested against other agents and each other. The test results indicated that after a training phase the developed strategy is capable of outperforming basic/intermediate playing strategies thus validating this approach.
Real-time strategy games present an environment in which game AI is expected to behave realistically. One feature of realistic behaviour in game AI is the ability to recognise the strategy of the opponent player. This is known as opponent... more
Real-time strategy games present an environment in which game AI is expected to behave realistically. One feature of realistic behaviour in game AI is the ability to recognise the strategy of the opponent player. This is known as opponent modeling. In this paper, we propose an approach of opponent modeling based on hierarchically structured models. The top-level of the hierarchy can classify the general play style of the opponent. The bottom-level of the hierarchy can classify specific strategies that further define the opponent’s behaviour. Experiments that test the approach are performed in the RTS game Spring. From our results we may conclude that the approach can be successfully used to classify the strategy of an opponent in the Spring game.
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, opponent modeling, unreliable information, and deception, much... more
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, opponent modeling, unreliable information, and deception, much like decision-making applications in the real world. Opponent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high performance. This paper describes and evaluates the implicit and explicit learning in the poker program L o k i. L o k i implicitly "learns" sophisticated strategies by selectively sampling likely cards for the opponents and then simulating the remainder of the game. The program has explicit learning for observing its opponents, constructing opponent models and dynamically adapting its play to exploit patterns in the opponents' play. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.
Since Emile Borel's study in 1938, the game of poker has resurfaced every decade as a test bed for research in mathematics, economics, game theory, and now a variety of computer science subfields. Poker is an excellent domain for AI... more
Since Emile Borel's study in 1938, the game of poker has resurfaced every decade as a test bed for research in mathematics, economics, game theory, and now a variety of computer science subfields. Poker is an excellent domain for AI research because it is a game of imperfect information and a game where opponent modeling can yield virtually unlimited complexity. Recent strides in poker research have produced computer programs that can outplay most intermediate players, but there is still a significant gap between computer programs and human experts due to the lack of accurate, purposeful opponent models. We present a method for constructing models of strategic deficiency, that is, an opponent model with an inherent roadmap for exploitation. In our model, a player using this method is able to outperform even the best static player when playing against a wide variety of opponents.
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like... more
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-making applications in the real world. Agent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high performance. This paper describes and evaluates Loki, a poker program capable of observing its opponents, constructing opponent models and dynamically adapting its play to best exploit patterns in the opponents' play.
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like... more
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-making applications in the real world. Agent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high
Bridge bidding is considered to be one of the most difficult problems for game-playing programs. It involves four agents rather than two, including a cooperative agent. In addition, the partial observability of the game makes it... more
Bridge bidding is considered to be one of the most difficult problems for game-playing programs. It involves four agents rather than two, including a cooperative agent. In addition, the partial observability of the game makes it impossible to predict the outcome of each action. In this paper we present a new decision-making algorithm that is capable of overcoming these problems. The algorithm allows models to be used for both opponent agents and partners, while utilizing a novel model-based Monte Carlo sampling method to overcome the problem of hidden information. The paper also presents a learning framework that uses the above decision-making algorithm for co-training of partners. The agents refine their selection strategies during training and continuously exchange their refined strategies. The refinement is based on inductive learning applied to examples accumulated for classes of states with conflicting actions. The algorithm was empirically evaluated on a set of bridge deals. The pair of agents that co-trained significantly improved their bidding performance to a level surpassing that of the current state-of-the-art bidding algorithm. 1. Henceforth, we will call a cooperative agent a co-agent and an opponent agent an opp-agent. 2. A par contest is a competition where the players are given an auction and compete only in playing.
In dynamic multiagent domains with adversary agents, an agent has to adapt its behavior to the opponent actions in order to increase its ability to compete. A frequently used opponent modeling approach in these domains is to rely on an... more
In dynamic multiagent domains with adversary agents, an agent has to adapt its behavior to the opponent actions in order to increase its ability to compete. A frequently used opponent modeling approach in these domains is to rely on an omniscient agent (e.g., a coach in a soccer environment) to classify the opponent and to communicate the opponent's model (or a counter-strategy for that model) to other agents. In this paper, we propose an alternative opponent modeling approach where each agent observes and classifies online the adversaries it encounters into automatically learned models. Thus, our approach requires neither an omniscient agent nor pre-defined models. Empirical results obtained in a simulated robotic soccer environment promises a high suitability of this approach for real-time, dynamic, multiagent domains.
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are... more
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are known only for special problems such as the adjustment of the parameters of an evaluation function. The SPSA algorithm (Simultaneous Perturbation Stochastic Approximation) is a generic stochastic gradient method for optimising an objective function when an analytic expression of the gradient is not available, a frequent case in game programs. Further, SPSA in its canonical form is very easy to implement. As such, it is an attractive choice for parameter optimisation in game programs, both due to its generality and simplicity. The goal of this paper is twofold: (i) to introduce SPSA for the game programming community by putting it into a game-programming perspective, and (ii) to propose and discuss several methods that can be used to enhance the performance of SPSA. These methods include using common random numbers and antithetic variables, a combination of SPSA with RPROP, and the reuse of samples of previous performance evaluations. SPSA with the proposed enhancements was tested in some large-scale experiments on tuning the parameters of an opponent model, a policy and an evaluation function in our poker program, MCRAISE. Whilst SPSA with no enhancements failed to make progress using the allocated resources, SPSA with the enhancements proved to be competitive with other methods, including TD-learning; increasing the average payoff per game by as large as 0.19 times the size of the amount of the small bet. From the experimental study, we conclude that the use of an appropriately enhanced variant of SPSA for the optimisation of game program parameters is a viable approach, especially if no good alternative exist for the types of parameters considered.
Developing computer programs that play Poker at human level is considered to be challenge to the A.I. research community, due to its incomplete information and stochastic nature. Due to these characteristics of the game, a competitive... more
Developing computer programs that play Poker at human level is considered to be challenge to the A.I. research community, due to its incomplete information and stochastic nature. Due to these characteristics of the game, a competitive agent must manage luck and use opponent modeling to be successful at short term and therefore be profitable. In this paper we propose the creation of No Limit Hold'em Poker agents by copying strategies of the best human players, by analyzing past games between them. To accomplish this goal, first we determine the best players on a set of game logs by determining which ones have higher winning expectation. Next, we define a classification problem to represent the player strategy, by associating a game state with the performed action. To validate and test the defined player model, the HoldemML framework was created. This framework generates agents by classifying the data present on the game logs with the goal to copy the best human player tactics. Th...
Bridge bidding is considered to be one of the most difficult problems for game-playing programs. It involves four agents rather than two, including a cooperative agent. In addition, the partial observability of the game makes it... more
Bridge bidding is considered to be one of the most difficult problems for game-playing programs. It involves four agents rather than two, including a cooperative agent. In addition, the partial observability of the game makes it impossible to predict the outcome of each action. In this paper we present a new decision-making algorithm that is capable of overcoming these problems. The algorithm allows models to be used for both opponent agents and partners, while utilizing a novel model-based Monte Carlo sampling method to overcome the problem of hidden information. The paper also presents a learning framework that uses the above decision-making algorithm for co-training of partners. The agents refine their selection strategies during training and continuously exchange their refined strategies. The refinement is based on inductive learning applied to examples accumulated for classes of states with conflicting actions. The algorithm was empirically evaluated on a set of bridge deals. The pair of agents that co-trained significantly improved their bidding performance to a level surpassing that of the current state-of-the-art bidding algorithm. 1. Henceforth, we will call a cooperative agent a co-agent and an opponent agent an opp-agent. 2. A par contest is a competition where the players are given an auction and compete only in playing.
While human players adjust their playing strategy according to their opponent, computerprograms, which are based on the minimax algorithm, use tha same playing strategy against anovice as against an expert. This is due to the assumption... more
While human players adjust their playing strategy according to their opponent, computerprograms, which are based on the minimax algorithm, use tha same playing strategy against anovice as against an expert. This is due to the assumption of minimax that the opponent usesthe same strategy as the player. This work studies the problem of opponent modelling in gameplaying. We recursively define
Standard models in bio-evolutionary game theory involve repetitions of a single stage game (e.g., the Prisoner's Dilemma or the Stag Hunt); but it is clear that repeatedly playing the same stage game is not an accurate model of most... more
Standard models in bio-evolutionary game theory involve repetitions of a single stage game (e.g., the Prisoner's Dilemma or the Stag Hunt); but it is clear that repeatedly playing the same stage game is not an accurate model of most individuals' lives. Rather, individuals' interactions with others correspond to many different kinds of stage games.
The game of Poker is an excellent test bed for studying opponent modeling methodologies applied to non-deterministic games with incomplete information. The most known Poker variant, Texas Hold'em Poker, combines simple rules with a huge... more
The game of Poker is an excellent test bed for studying opponent modeling methodologies applied to non-deterministic games with incomplete information. The most known Poker variant, Texas Hold'em Poker, combines simple rules with a huge amount of possible playing strategies. This paper is focused on developing algorithms for performing simple online opponent modeling in Texas Hold'em. The opponent modeling approach developed enables to select the best strategy to play against each given opponent. Several autonomous agents were developed in order to simulate typical Poker player's behavior and one other agent, was developed capable of using simple opponent modeling techniques in order to select the best playing strategy against each of the other opponents. Results achieved in realistic experiments using eight distinct poker playing agents showed the usefulness of the approach. The observer agent developed is clearly capable of outperforming all its counterparts in all the experiments performed.
This work presents a generalized theoretical framework that allows incorporation of opponent models into adversary search. We present the M* algorithm, a generalization of minimax that uses an arbitrary opponent model to simulate the... more
This work presents a generalized theoretical framework that allows incorporation of opponent models into adversary search. We present the M* algorithm, a generalization of minimax that uses an arbitrary opponent model to simulate the opponent's search. The opponent model is a recursive structure consisting of the opponent's evaluation function and its model of the player. We demonstrate experimentally the potential benefit of using an opponent model. Pruning in M* is impossible in the general case. We prove a sufficient condition for pruning and present the cup* algorithm which returns the M* value of a tree while searching only necessary branches.
Standard models in bio-evolutionary game theory involve repetitions of a single stage game (e.g., the Prisoner's Dilemma or the Stag Hunt); but it is clear that repeatedly playing the same stage game is not an accurate model of most... more
Standard models in bio-evolutionary game theory involve repetitions of a single stage game (e.g., the Prisoner's Dilemma or the Stag Hunt); but it is clear that repeatedly playing the same stage game is not an accurate model of most individuals' lives. Rather, individuals' interactions with others correspond to many different kinds of stage games. In this work, we concentrate on discovering behavioral strategies that are successful for the life game, in which the stage game is chosen stochastically at each iteration. We present a cognitive agent model based on Social Value Orientation (SVO) theory. We provide extensive evaluations of our model's performance, both against standard agents from the game theory literature and against a large set of life-game agents written by students in two different countries. Our empirical results suggest that for life-game strategies to be successful in environments with such agents, it is important (i) to be unforgiving with respect to trust behavior and (ii) to use adaptive, fine-grained opponent models of the other agents.
Real-time strategy games present an environment in which game AI is expected to behave realistically. One feature of realistic behaviour in game AI is the ability to recognise the strategy of the opponent player. This is known as opponent... more
Real-time strategy games present an environment in which game AI is expected to behave realistically. One feature of realistic behaviour in game AI is the ability to recognise the strategy of the opponent player. This is known as opponent modeling. In this paper, we propose an approach of opponent modeling based on hierarchically structured models. The top-level of the hierarchy can classify the general play style of the opponent. The bottom-level of the hierarchy can classify specific strategies that further define the opponent's behaviour. Experiments that test the approach are performed in the RTS game Spring. From our results we may conclude that the approach can be successfully used to classify the strategy of an opponent in the Spring game.
Opponent-model (OM) search comes with two types of risk. The first type is caused by a playerÕs imperfect knowledge of the opponent, the second type arises from lowquality evaluation functions. In this paper, we investigate the... more
Opponent-model (OM) search comes with two types of risk. The first type is caused by a playerÕs imperfect knowledge of the opponent, the second type arises from lowquality evaluation functions. In this paper, we investigate the desirability of a precondition, called admissibility, that may prevent the second type of risk. We examine the results of two sets of experiments: the first set is taken from the game of LOA, and the second set from the KQKR chess endgame. The LOA experiments show that when admissibility happens to be absent, the OM results are not positive. The chess experiments demonstrate that when an admissible pair of evaluation functions is available, OM search performs better than minimax, provided that there is sufficient room to make errors. Furthermore, we conclude that the expectation Ôthe better the quality of the prediction of the opponentÕs move, the more successful OM search isÕ is only true if the quality of both evaluation functions is sufficiently high.
During the last decades, opponent modeling techniques, utilized to improve the negotiation outcome, have sparked interest in the negotiation research community. In this study, we first investigate the applicability of nearest neighbor... more
During the last decades, opponent modeling techniques, utilized to improve the negotiation outcome, have sparked interest in the negotiation research community. In this study, we first investigate the applicability of nearest neighbor method with different distance functions in modeling the opponent's preferences. Then, we introduce a new distance-based model to extract the opponent's preferences in a bilateral multi issue negotiation session. We devise an experiment to evaluate the efficiency of our proposed model in a real negotiation setting in terms of a number of performance measures.
Multiagent environments are often not cooperative nor collaborative; in many cases, agents have conflicting interests, leading to adversarial interactions. This paper presents a formal Adversarial Environment model for bounded rational... more
Multiagent environments are often not cooperative nor collaborative; in many cases, agents have conflicting interests, leading to adversarial interactions. This paper presents a formal Adversarial Environment model for bounded rational agents operating in a zero-sum environment. In such environments, attempts to use classical utility-based search methods can raise a variety of difficulties (e.g., implicitly modeling the opponent as an omniscient utility maximizer, rather than leveraging a more nuanced, explicit opponent model). We define an Adversarial Environment by describing the mental states of an agent in such an environment. We then present behavioral axioms that are intended to serve as design principles for building such adversarial agents. We explore the application of our approach by analyzing log files of completed Connect-Four games, and present an empirical analysis of the axioms' appropriateness.
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the... more
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is represented as a repeated two-player game, where the agents' objective is to look for a strategy that maximizes their expected sum of rewards in the game. We assume that agents' strategies can be modeled as finite automata. A model-based approach is presented as a possible method for learning an effective interactive strategy. First, we describe how an agent should find an optimal strategy against a given model. Second, we present a heuristic algorithm that infers a model of the opponent's automaton from its input/output behavior. A set of experiments that show the potential merit of the algorithm is reported as well.
While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use tha same playing strategy against a novice as against an expert. This is due to the assumption... more
While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use tha same playing strategy against a novice as against an expert. This is due to the assumption of minimax that the opponent uses the same strategy as the player. This work studies the problem of opponent modelling in game playing. We recursively de ne a player as a pair of a strategy and an opponent model, which i s also a player. A strategy can be determined by the static evaluation function and the depth of search. M , an algorithm for searching game-trees using an n-level modelling player that uses such a s t r a t e g y , is described and analyzed. We demonstrate experimentally the bene t of using an opponent model and show the potential harm caused by the use of an inaccurate model. We then describe an algorithm, M , for using uncertain models when a bound on the model error is known. Pruning in M is impossible in the general case. We p r o ve a su cient condition for pruning and present a pruning algorithm, , that returns the M value of a tree, searching only necessary subtrees. Finally, w e present a method for acquiring a model for an unknown player. First, we describe a learning algorithm that acquires a model of the opponent's depth of search b y using its past moves as examples. Then, an algorithm for acquiring a model of the player's strategy, both depth and function, is described and evaluated. Experiments with this algorithm show that when a superset of the set of features used by a xed opponent i s a vailable to the learner, few examples are su cient for learning a model that agrees almost perfectly with the opponent.
The wireless networks and mobile computing applications are rapidly changing the landscape of network security. These technologies create new vulnerabilities that do not exist in wired network. Some of the techniques and methods of... more
The wireless networks and mobile computing applications are rapidly changing the landscape of network security. These technologies create new vulnerabilities that do not exist in wired network. Some of the techniques and methods of network securities are ineffective. The traditional way of protecting networks with firewalls and encryption software are not sufficient for detecting new types of attack in wireless environment. So, we need to develop new architecture and mechanisms to protect the wireless networks and mobile computing applications. Many network security systems available in market are capable to secure networks from various kinds of attacks. These techniques are rule dependent and some are rule independent and they are playing important role in information security. The modern network security systems are too complex and timeconsuming. These are not affordable on the basis of its cost as well as performance. Many network security systems are not platform independent. In this paper, we demonstrate and revisit experimental standalone methodologies that detect the message modification, replay attacks, an identification of unauthorized users in ad-hoc networks. The proposed system is simple, economical, and platform independent.
The development of competitive artificial Poker players is a challenge to Artificial Intelligence (AI) because the agent must deal with unreliable information and deception which make it essential to model the opponents to achieve good... more
The development of competitive artificial Poker players is a challenge to Artificial Intelligence (AI) because the agent must deal with unreliable information and deception which make it essential to model the opponents to achieve good results. In this paper we propose the creation of an artificial Poker player through the analysis of past games between human players, with money involved. To accomplish this goal, we defined a classification problem that associates a given game state with the action that was performed by the player. To validate and test the defined player model, an agent that follows the learned tactic was created. The agent approximately follows the tactics from the human players, thus validating this model. However, this approach alone is insufficient to create a competitive agent, as generated strategies are static, meaning that they can't adapt to different situations. To solve this problem, we created an agent that uses a strategy that combines several tactics from different players. By using the combined strategy, the agent greatly improved its performance against adversaries capable of modeling opponents.
Since Emile Borel's study in 1938, the game of poker has resurfaced every decade as a test bed for research in mathematics, economics, game theory, and now a variety of computer science subfields. Poker is an excellent domain for AI... more
Since Emile Borel's study in 1938, the game of poker has resurfaced every decade as a test bed for research in mathematics, economics, game theory, and now a variety of computer science subfields. Poker is an excellent domain for AI research because it is a game of imperfect information and a game where opponent modeling can yield virtually unlimited complexity. Recent strides in poker research have produced computer programs that can outplay most intermediate players, but there is still a significant gap between computer programs and human experts due to the lack of accurate, purposeful opponent models. We present a method for constructing models of strategic deficiency, that is, an opponent model with an inherent roadmap for exploitation. In our model, a player using this method is able to outperform even the best static player when playing against a wide variety of opponents.
In competitive domains, some knowledge about the opponent can give players a clear advantage. This idea led many people to propose approaches that automatically acquire models of opponents, based only on the observation of their... more
In competitive domains, some knowledge about the opponent can give players a clear advantage. This idea led many people to propose approaches that automatically acquire models of opponents, based only on the observation of their input-output behavior. If opponent outputs could be accessed directly, a model can be constructed by feeding a machine learning method with traces of the behavior of the opponent. However, that is not the case in the Robocup domain where an agent does not have direct access to the opponent inputs and outputs. Rather, the agent sees the opponent behavior from its own point of view and inputs and outputs (actions) have to be inferred from observation. In this paper, we present an approach to model low-level behavior of individual opponent agents. First, we build a classifier to infer and label opponent actions based on observation. Second, our agent observes an opponent and labels its actions using the previous classifier. From these observations, machine learning techniques generate a model that predicts the opponent actions. Finally, the agent uses the model to anticipate opponent actions. In order to test our ideas, we created an architecture, ombo (Opponent Modeling Based on Observation). Using ombo, a striker agent can anticipate goalie actions. Results show that in this striker-goalie scenario, scores are significantly higher using the acquired opponent's model of actions.
Stochastic Opponent Modeling Agents (SOMA) have been proposed as a paradigm for reasoning about cultural groups, terror groups, and other socioeconomic- political-military organizations worldwide. In this paper, we describe a case study... more
Stochastic Opponent Modeling Agents (SOMA) have been proposed as a paradigm for reasoning about cultural groups, terror groups, and other socioeconomic- political-military organizations worldwide. In this paper, we describe a case study that shows how SOMA was used to model the behavior of the terrorist organization, Hezbollah. Our team, consisting of a mix of computer scientists, policy experts, and political scientists, were able to understand new facts about Hezbollah of which even seasoned Hezbollah experts may not have been aware. This paper briefly overviews SOMA rules, explains how more than 14,000 SOMA rules for Hezbollah were automatically derived, and then describes a few key findings about Hezbollah, enabled by this framework.
We develop an upper bound for the potential performance improvement of an agent using a best response to a model of an opponent instead of an uninformed game-theoretic equilibrium strategy. We show that the bound is a function of only the... more
We develop an upper bound for the potential performance improvement of an agent using a best response to a model of an opponent instead of an uninformed game-theoretic equilibrium strategy. We show that the bound is a function of only the domain structure of an adversarial environment and does not depend on the actual actors in the environment. This bounds-finding technique will enable system designers to determine if and what type of opponent models would be profitable in a given adversarial environment. It also gives them a baseline value with which to compare performance of instantiated opponent models. We study this method in two domains: selecting intelligence collection priorities for convoy defense and determining the value of predicting enemy decisions in a simplified war game.
A new approach for heuristic game-tree search, probabilistic opponent-model search (PrOM search), is proposed. It is based on standard opponent-model search (OM search). The new approach takes into account a multiple-opponent model. It... more
A new approach for heuristic game-tree search, probabilistic opponent-model search (PrOM search), is proposed. It is based on standard opponent-model search (OM search). The new approach takes into account a multiple-opponent model. It incorporates uncertainty which mimics the uncertainty of a player about the behaviour of the opponent. Some theoretical results on PrOM search are derived. Implementations of both OM and PrOM search (both with b-passing) are presented and best-case analyses are given. To investigate the computational eciency, experiments are performed on random game trees. PrOM search appears to lead to serious computational costs when the search depth increases. To test the eectiveness of OM and PrOM search in practice, three tournaments in the game of LOA are performed. The tournaments suggest that PrOM search is more eective than a±b search when search trees of the same depth are used. The tournaments also show that OM search performs not very good and sometimes even disastrous. In spite of the computational costs, the encouraging results of the tournaments and the opportunity that PrOM search oers for actual opponent modelling during the search makes PrOM search a viable alternative to minimax-based search algorithms. Ó
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are... more
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are known only for special problems such as the adjustment of the parameters of an evaluation function. The SPSA algorithm (Simultaneous Perturbation Stochastic Approximation) is a generic stochastic gradient method for optimising an objective function when an analytic expression of the gradient is not available, a frequent case in game programs. Further, SPSA in its canonical form is very easy to implement. As such, it is an attractive choice for parameter optimisation in game programs, both due to its generality and simplicity. The goal of this paper is twofold: (i) to introduce SPSA for the game programming community by putting it into a gameprogramming perspective, and (ii) to propose and discuss several methods that can be used to enhance the performance of SPSA. These methods include using common random numbers and antithetic variables, a combination of SPSA with RPROP, and the reuse of samples of previous performance evaluations. SPSA with the proposed enhancements was tested in some large-scale experiments on tuning the parameters of an opponent model, a policy and an evaluation function in our poker program, MCRAISE. Whilst SPSA with no enhancements failed to make progress using the allocated resources, SPSA with the enhancements proved to be competitive with other methods, including TD-learning; increasing the average payoff per game by as large as 0.19 times the size of the amount of the small bet. From the experimental study, we conclude that the use of an appropriately enhanced variant of SPSA for the optimisation of game program parameters is a viable approach, especially if no good alternative exist for the types of parameters considered.
Researching into the incomplete information games (IIG) field requires the development of strategies which focus on optimizing the decision making process, as there is no unequivocal best choice for a particular play. As such, this paper... more
Researching into the incomplete information games (IIG) field requires the development of strategies which focus on optimizing the decision making process, as there is no unequivocal best choice for a particular play. As such, this paper describes the development process and testing of an agent able to compete against human players on Pokerone of the most popular IIG. The used methodology combines pre-defined opponent models with a reinforcement learning approach. The decision-making algorithm creates a different strategy against each type of opponent by identifying the opponent's type and adjusting the rewards of the actions of the corresponding strategy. The opponent models are simple classifications used by Poker experts. Thus, each strategy is constantly adapted throughout the games, continuously improving the agent's performance. In light of this, two agents with the same structure but different rewarding conditions were developed and tested against other agents and each other. The test results indicated that after a training phase the developed strategy is capable of outperforming basic/intermediate playing strategies thus validating this approach.
Multiagent research provides an extensive literature on formal Beliefs-Desires-Intentions (BDI) based models describing the notion of teamwork and cooperation. However, multiagent environments are often not cooperative nor collaborative;... more
Multiagent research provides an extensive literature on formal Beliefs-Desires-Intentions (BDI) based models describing the notion of teamwork and cooperation. However, multiagent environments are often not cooperative nor collaborative; in many cases, agents have conflicting interests, leading to adversarial interactions. This form of interaction has not yet been formally defined in terms of the agents mental states, beliefs, desires and intentions.
Multiagent environments are often not cooperative nor collaborative; in many cases, agents have conflicting interests, leading to adversarial interactions. This paper presents a formal Adversarial Environment model for bounded rational... more
Multiagent environments are often not cooperative nor collaborative; in many cases, agents have conflicting interests, leading to adversarial interactions. This paper presents a formal Adversarial Environment model for bounded rational agents operating in a zero-sum environment. In such environments, attempts to use classical utility-based search methods can raise a variety of difficulties (e.g., implicitly modeling the opponent as an omniscient utility maximizer, rather than leveraging a more nuanced, explicit opponent model).
- by Inon Zuckerman and +1
- •
- Modal Logic, Bounded Rationality, Log Files, Empirical Analysis
The prediction of the future states in Multi-Agent Systems has been a challenging problem since the begining of MAS. Robotic soccer is a MAS environment in which the predictions of the opponents strategy, or opponent modeling, plays an... more
The prediction of the future states in Multi-Agent Systems has been a challenging problem since the begining of MAS. Robotic soccer is a MAS environment in which the predictions of the opponents strategy, or opponent modeling, plays an important role. In this paper, a novel case-based architecture is applied in the soccer coach that learns and predicts opponent movements.
While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use the same playing strategy against a novice as against an expert. This is due to the assumption... more
While human players adjust their playing strategy according to their opponent, computer programs, which are based on the minimax algorithm, use the same playing strategy against a novice as against an expert. This is due to the assumption of minimax that the opponent uses the same strategy as the player. This work studies the problem of opponent modeling in game playing. M , an algorithm for searching game-trees using an opponent model is described and analyzed. We demonstrate experimentally the bene t of using an opponent model and show the potential harm caused by the use of an inaccurate model. We then describe an algorithm, M , for using uncertain models when a bound on the model error is known. Pruning in M is impossible in the general case. We prove a su cient condition for pruning and present a pruning algorithm, , that returns the M value of a tree, searching only necessary subtrees. Finally, we present a method for acquiring a model for an unknown player. First, we describe a learning algorithm that acquires a model of the opponent's depth of search by using its past moves as examples. Then, an algorithm for acquiring a model of the player's strategy, both depth and function, is described and evaluated. Experiments with this algorithm show that when a superset of the set of features used by a xed opponent is available to the learner, few examples are su cient for learning a model that agrees almost perfectly with the opponent.
During the last decades, opponent modeling techniques, utilized to improve the negotiation outcome, have sparked interest in the negotiation research community. In this study, we first investigate the applicability of nearest neighbor... more
During the last decades, opponent modeling techniques, utilized to improve the negotiation outcome, have sparked interest in the negotiation research community. In this study, we first investigate the applicability of nearest neighbor method with different distance functions in modeling the opponent's preferences. Then, we introduce a new distance-based model to extract the opponent's preferences in a bilateral multi issue negotiation session. We devise an experiment to evaluate the efficiency of our proposed model in a real negotiation setting in terms of a number of performance measures.
This document introduces COACH UNILANG, a standard language for coaching (Robo)Soccer teams. This language was developed with two main objectives: to coach FC Portugal 2001 team and as a proposal to be used in Fukuoka 2002 RoboCup coach... more
This document introduces COACH UNILANG, a standard language for coaching (Robo)Soccer teams. This language was developed with two main objectives: to coach FC Portugal 2001 team and as a proposal to be used in Fukuoka 2002 RoboCup coach competition. This language enables high-level and low-level coaching through coach instructions. High-level coaching includes changing tactics, formations used in each situation and
The development of competitive artificial Poker players is a challenge to Artificial Intelligence (AI) because the agent must deal with unreliable information and deception which make it essential to model the opponents to achieve good... more
The development of competitive artificial Poker players is a challenge to Artificial Intelligence (AI) because the agent must deal with unreliable information and deception which make it essential to model the opponents to achieve good results. In this paper we propose the creation of an artificial Poker player through the analysis of past games between human players, with money involved. To accomplish this goal, we defined a classification problem that associates a given game state with the action that was performed by the player. To validate and test the defined player model, an agent that follows the learned tactic was created. The agent approximately follows the tactics from the human players, thus validating this model. However, this approach alone is insufficient to create a competitive agent, as generated strategies are static, meaning that they can't adapt to different situations. To solve this problem, we created an agent that uses a strategy that combines several tactics from different players. By using the combined strategy, the agent greatly improved its performance against adversaries capable of modeling opponents.
Researching into the incomplete information games (IIG) field requires the development of strategies which focus on optimizing the decision making process, as there is no unequivocal best choice for a particular play. As such, this paper... more
Researching into the incomplete information games (IIG) field requires the development of strategies which focus on optimizing the decision making process, as there is no unequivocal best choice for a particular play. As such, this paper describes the development process and testing of an agent able to compete against human players on Pokerone of the most popular IIG. The used methodology combines pre-defined opponent models with a reinforcement learning approach. The decision-making algorithm creates a different strategy against each type of opponent by identifying the opponent's type and adjusting the rewards of the actions of the corresponding strategy. The opponent models are simple classifications used by Poker experts. Thus, each strategy is constantly adapted throughout the games, continuously improving the agent's performance. In light of this, two agents with the same structure but different rewarding conditions were developed and tested against other agents and each other. The test results indicated that after a training phase the developed strategy is capable of outperforming basic/intermediate playing strategies thus validating this approach.
The Multi-model search framework generalizes minimax to allow exploitation of recursive opponent models. In this work we consider adding pruning to the multi-model search. We prove a sufficient condition that enables pruning and describe... more
The Multi-model search framework generalizes minimax to allow exploitation of recursive opponent models. In this work we consider adding pruning to the multi-model search. We prove a sufficient condition that enables pruning and describe two pruning algorithms, αβ * and αβ * 1p. We prove correctness and optimality of the algorithms and provide an experimental study of their pruning power. We show that for opponent models that are not radically different from the player's strategy, the pruning power of these algorithms is significant.
Abstract. An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The... more
Abstract. An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the ...
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the... more
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is ...
Real-time strategy games compose a large part of modern game industry. Simulating human behavior is a crucial aspect for the artificial opponents of these games, in order to provide a good experience for the user. To achieve this, the AI... more
Real-time strategy games compose a large part of modern game industry. Simulating human behavior is a crucial aspect for the artificial opponents of these games, in order to provide a good experience for the user. To achieve this, the AI should ideally be able to recognize the strategy of the human player and adapt to it. This concept is known as opponent modeling. This paper describes an approach of opponent modeling using hierarchical structured models. As classifiers, two different approaches have been used for the different hierarchical levels, in order to test the effectiveness of this approach. The first classifier uses fuzzy models whereas the second classifier is a modification of the concept of discounted rewards from game theory. It is observed that the fuzzy classifier shows a stable convergence to a correct classification with high confidence rating. The experiments regarding the discounted reward approach revealed, that for some sub-models this approach showed a similar well conversion as the fuzzy classifier. However, for other sub-models only mediocre results were achieved with a late and unstable convergence. We conclude that this approach is suitable for real-time strategy games and can be reliable for each submodel with further research.
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are... more
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are known only for special problems such as the adjustment of the parameters of an evaluation function. The SPSA algorithm (Simultaneous Perturbation Stochastic Approximation) is a generic stochastic gradient method for optimising an objective function when an analytic expression of the gradient is not available, a frequent case in game programs. Further, SPSA in its canonical form is very easy to implement. As such, it is an attractive choice for parameter optimisation in game programs, both due to its generality and simplicity. The goal of this paper is twofold: (i) to introduce SPSA for the game programming community by putting it into a game-programming perspective, and (ii) to propose and discuss several methods that can be used to enhance the performance of SPSA. These methods include using common random numbers and antithetic variables, a combination of SPSA with RPROP, and the reuse of samples of previous performance evaluations. SPSA with the proposed enhancements was tested in some large-scale experiments on tuning the parameters of an opponent model, a policy and an evaluation function in our poker program, MCRAISE. Whilst SPSA with no enhancements failed to make progress using the allocated resources, SPSA with the enhancements proved to be competitive with other methods, including TD-learning; increasing the average payoff per game by as large as 0.19 times the size of the amount of the small bet. From the experimental study, we conclude that the use of an appropriately enhanced variant of SPSA for the optimisation of game program parameters is a viable approach, especially if no good alternative exist for the types of parameters considered.
The efficiency of automated multi-issue negotiation depends on the availability and quality of knowledge about an opponent. We present a generic framework based on Bayesian learning to learn an opponent model, i.e. the issue preferences... more
The efficiency of automated multi-issue negotiation depends on the availability and quality of knowledge about an opponent. We present a generic framework based on Bayesian learning to learn an opponent model, i.e. the issue preferences as well as the issue priorities of an opponent. The algorithm proposed is able to effectively learn opponent preferences from bid exchanges by making some assumptions about the preference structure and rationality of the bidding process. The assumptions used are general and consist among others of assumptions about the independency of issue preferences and the topology of functions that are used to model such preferences. Additionally, a rationality assumption is introduced that assumes that agents use a concession-based strategy. It thus extends and generalizes previous work on learning in negotiation by introducing a technique to learn an opponent model for multi-issue negotiations. We present experimental results demonstrating the effectiveness of our approach and discuss an approximation algorithm to ensure scalability of the learning algorithm.
Opponent Modeling is one of the most attractive and practical arenas in Multi Agent System (MAS) for predicting and identifying the future behaviors of opponent. This paper introduces a novel approach using rule based expert system... more
Opponent Modeling is one of the most attractive and practical arenas in Multi Agent System (MAS) for predicting and identifying the future behaviors of opponent. This paper introduces a novel approach using rule based expert system towards opponent modeling in RoboCup Soccer Coach Simulation. In this scene, an autonomous coach agent is able to identify the patterns of the opponent by analyzing the opponent's past games and advising own players. For this purpose, the main goal of our research comprises two complementary parts: (a) developing a 3-tier learning architecture for classifying opponent behaviors. To achieve this objective, sequential events of the game are identified using environmental data. Then the patterns of the opponent are predicted using statistical calculations. Eventually, by comparing the opponent patterns with the rest of team's behavior, a model of the opponent is constructed. (b) designing a rule based expert system containing provocation strategies to expedite detection of opponent patterns. These items mentioned are used by coach, to model the opponent and generate an appropriate strategy to play against the opponent. This structure is tested in RoboCup Soccer Coach Simulation and MRLCoach was the champion at RoboCup 2006 in Germany.