A Game Theoretic Analysis of LQG Control under Adversarial Attack (original) (raw)

Dynamic Cheap Talk for Robust Adversarial Learning

Lecture Notes in Computer Science, 2019

Robust adversarial learning is considered in the context of closed-loop control with adversarial signaling in this paper. Due to the nature of incomplete information of the control agent about the environment, the belief-dependent signaling game formulation is introduced in the dynamic system and a dynamic cheap talk game is formulated with belief-dependent strategies for both players. We show that the dynamic cheap talk game can further be reformulated as a particular stochastic game, where the states are beliefs of the environment and the actions are the adversarial manipulation strategies and control strategies. Furthermore, the bisimulation metric is proposed and studied for the dynamic cheap talk game, which provides an upper bound on the difference between values of different initial beliefs in the zero-sum equilibrium.

Partial Adversarial Behavior Deception in Security Games

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Learning attacker behavior is an important research topic in security games as security agencies are often uncertain about attackers' decision making. Previous work has focused on developing various behavioral models of attackers based on historical attack data. However, a clever attacker can manipulate its attacks to fail such attack-driven learning, leading to ineffective defense strategies. We study attacker behavior deception with three main contributions. First, we propose a new model, named partial behavior deception model, in which there is a deceptive attacker (among multiple attackers) who controls a portion of attacks. Our model captures real-world security scenarios such as wildlife protection in which multiple poachers are present. Second, we introduce a new scalable algorithm, GAMBO, to compute an optimal deception strategy of the deceptive attacker. Our algorithm employs the projected gradient descent and uses the implicit function theorem for the computation of gr...

Expected payoff analysis of dynamic mixed strategies in an adversarial domain

2011 IEEE Symposium on Intelligent Agent (IA), 2011

Adversarial decision making is aimed at determining optimal decision strategies to deal with an adversarial and adaptive opponent. One defense against this adversary is to make decisions that are intended to confuse him, although our rewards can be diminished. In this contribution, we describe ongoing research in the design of time varying decision strategies for a a simple adversarial model. The strategies obtained are compared against static strategies from a theoretical and empirical point of view. The results show encouraging improvements that open new venues for research.

A Game Theoretical Framework for Adversarial Learning

Many data mining applications, ranging from spam filtering to intrusion detection, are faced with active adversaries. In all these applications, initially successful classifiers will degrade easily. This becomes a game between the adversary and the data miner: The adversary modifies its strategy to avoid being detected by the current classifier; the data miner then updates its classifier based on the new threats. In this paper, we investigate the possibility of an equilibrium in this seemingly never ending game, where neither party has an incentive to change. Modifying the classifier causes too many false positives with too little increase in true positives; changes by the adversary decrease the utility of the false negative items that aren't detected. We develop a game theoretic framework where the equilibrium behavior of adversarial learning applications can be analyzed, and provide a solution for finding the equilibrium point. A classifier's equilibrium performance indicates its eventual success or failure. The data miner could then select attributes based on their equilibrium performance, and construct an effective classifier.

Bayesian Games for Adversarial Regression Problems

2013

We study regression problems in which an adversary can exercise some control over the data generation process. Learner and adversary have conflicting but not necessarily perfectly antagonistic objectives. We study the case in which the learner is not fully informed about the adversary's objective; instead, any knowledge of the learner about parameters of the adversary's goal may be reflected in a Bayesian prior. We model this problem as a Bayesian game, and characterize conditions under which a unique Bayesian equilibrium point exists. We experimentally compare the Bayesian equilibrium strategy to the Nash equilibrium strategy, the minimax strategy, and regular linear regression.

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this optimization, the joint policies converge to a Nash equilibrium. Unlike fictitious play (XFP) and counterfactual regret minimization (CFR), our convergence result pertains to the policies being optimized rather than the average policies. Our experiments demonstrate convergence rates comparable to XFP and CFR in four benchmark games in the tabular case. Using function approximation, we find that our algorithm outperforms the tabular version in two of the games, which, to the best of our knowledge, is the first such result in imperfect information games among this class of algorithms.

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

2020

Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) that accounts for this gap during the policy training, by modeling the gap as an adversary against the training agent. In this work, we reexamine the effectiveness of RARL under a fundamental robust control setting: the linear quadratic (LQ) case. We first observe that the popular RARL scheme that greedily alternates agents’ updates can easily destabilize the system. Motivated by this, we propose several other policy-based RARL algorithms whose convergence behaviors are then studied both empirically and theoretically. We find: i) the conventional RARL framework (Pinto et al., 2017) can learn a destabilizing policy if the initial policy does not enjoy the robust stability property against the adversary; and ii) with robustly stabilizing initializations, our proposed double-loop RARL algorithm provably conver...

Reward-Free Attacks in Multi-Agent Reinforcement Learning

ArXiv, 2021

We investigate how effective an attacker can be when it only learns from its victim’s actions, without access to the victim’s reward. In this work, we are motivated by the scenario where the attacker wants to behave strategically when the victim’s motivations are unknown. We argue that one heuristic approach an attacker can use is to maximize the entropy of the victim’s policy. The policy is generally not obfuscated, which implies it may be extracted simply by passively observing the victim. We provide such a strategy in the form of a reward-free exploration algorithm that maximizes the attacker’s entropy during the exploration phase, and then maximizes the victim’s empirical entropy during the planning phase. In our experiments, the victim agents are subverted through policy entropy maximization, implying an attacker might not need access to the victim’s reward to succeed. Hence, reward-free attacks, which are based only on observing behavior, show the feasibility of an attacker to...

Robust Reinforcement Learning on State Observations with Learned Optimal Adversary

2021

We study the robustness of reinforcement learning (RL) with adversarially perturbed state observations, which aligns with the setting of many adversarial attacks to deep reinforcement learning (DRL) and is also important for rolling out real-world RL agent under unpredictable sensing noise. With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found, which is guaranteed to obtain the worst case agent reward. For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones. To enhance the robustness of an agent, we propose a framework of alternating training with learned adversaries (ATLA), which trains an adversary online together with the agent using policy gradient following the optimal adversarial attack framework. Additionally, inspired by the analysis of state-adversarial Markov decision process (SA-MDP), we show that past states and actions (hist...

The Impact of Complex and Informed Adversarial Behavior in Graphical Coordination Games

IEEE Transactions on Control of Network Systems, 2021

How does system-level information impact the ability of an adversary to degrade performance in a networked control system? How does the complexity of an adversary's strategy affect its ability to degrade performance? This paper focuses on these questions in the context of graphical coordination games where an adversary can influence a given fraction of the agents in the system, and the agents follow log-linear learning, a well-known distributed learning algorithm. Focusing on a class of homogeneous ring graphs of various connectivity, we begin by demonstrating that minimally connected ring graphs are the most susceptible to adversarial influence. We then proceed to characterize how both (i) the sophistication of the attack strategies (static vs dynamic) and (ii) the informational awareness about the network structure can be leveraged by an adversary to degrade system performance. Focusing on the set of adversarial policies that induce stochastically stable states, our findings demonstrate that the relative importance between sophistication and information changes depending on the the influencing power of the adversary. In particular, sophistication far outweighs informational awareness with regards to degrading system-level damage when the adversary's influence power is relatively weak. However, the opposite is true when an adversary's influence power is more substantial.