Threat Modeling AI/ML Systems and Dependencies (original) (raw)

By Andrew Marshall, Jugal Parikh, Emre Kiciman and Ram Shankar Siva Kumar

Special Thanks to Raul Rojas and the AETHER Security Engineering Workstream

November 2019

This document is a deliverable of the AETHER Engineering Practices for AI Working Group and supplements existing SDL threat modeling practices by providing new guidance on threat enumeration and mitigation specific to the AI and Machine Learning space. It is intended to be used as a reference during security design reviews of the following:

  1. Products/services interacting with or taking dependencies on AI/ML-based services
  2. Products/services being built with AI/ML at their core

Traditional security threat mitigation is more important than ever. The requirements established by the Security Development Lifecycle are essential to establishing a product security foundation that this guidance builds upon. Failure to address traditional security threats helps enable the AI/ML-specific attacks covered in this document in both the software and physical domains, as well as making compromise trivial lower down the software stack. For an introduction to net-new security threats in this space seeSecuring the Future of AI and ML at Microsoft.

The skillsets of security engineers and data scientists typically do not overlap. This guidance provides a way for both disciplines to have structured conversations on these net-new threats/mitigations without requiring security engineers to become data scientists or vice versa.

This document is divided into two sections:

  1. “Key New Considerations in Threat Modeling” focuses on new ways of thinking and new questions to ask when threat modeling AI/ML systems. Both data scientists and security engineers should review this as it will be their playbook for threat modeling discussions and mitigation prioritization.
  2. “AI/ML-specific Threats and their Mitigations” provides details on specific attacks as well as specific mitigation steps in use today to protect Microsoft products and services against these threats. This section is primarily targeted at data scientists who may need to implement specific threat mitigations as an output of the threat modeling/security review process.

This guidance is organized around an Adversarial Machine Learning Threat Taxonomy created by Ram Shankar Siva Kumar, David O’Brien, Kendra Albert, Salome Viljoen, and Jeffrey Snover entitled “Failure Modes in Machine Learning.” For incident management guidance on triaging security threats detailed in this document, refer to the SDL Bug Bar for AI/ML Threats. All of these are living documents which will evolve over time with the threat landscape.

Key New Considerations in Threat Modeling: Changing the way you view Trust Boundaries

Assume compromise/poisoning of the data you train from as well as the data provider. Learn to detect anomalous and malicious data entries as well as being able to distinguish between and recover from them

Summary

Training Data stores and the systems that host them are part of your Threat Modeling scope. The greatest security threat in machine learning today is data poisoning because of the lack of standard detections and mitigations in this space, combined with dependence on untrusted/uncurated public datasets as sources of training data. Tracking the provenance and lineage of your data is essential to ensuring its trustworthiness and avoiding a “garbage in, garbage out” training cycle.

Questions to Ask in a Security Review

Example Attacks

Identify actions your model(s) or product/service could take which can cause customer harm online or in the physical domain

Summary

Left unmitigated, attacks on AI/ML systems can find their way over to the physical world. Any scenario which can be twisted to psychologically or physically harm users is a catastrophic risk to your product/service. This extends to any sensitive data about your customers used for training and design choices that can leak those private data points.

Questions to Ask in a Security Review

Example Attacks

Identify all sources of AI/ML dependencies as well as frontend presentation layers in your data/model supply chain

Summary

Many attacks in AI and Machine Learning begin with legitimate access to APIs which are surfaced to provide query access to a model. Because of the rich sources of data and rich user experiences involved here, authenticated but “inappropriate” (there’s a gray area here) 3rd-party access to your models is a risk because of the ability to act as a presentation layer above a Microsoft-provided service.

Questions to Ask in a Security Review

Example Attacks

AI/ML-specific Threats and their Mitigations

#1: Adversarial Perturbation

Description

In perturbation-style attacks, the attacker stealthily modifies the query to get a desired response from a production-deployed model[1]. This is a breach of model input integrity which leads to fuzzing-style attacks where the end result isn’t necessarily an access violation or EOP, but instead compromises the model’s classification performance. This can also be manifested by trolls using certain target words in a way that the AI will ban them, effectively denying service to legitimate users with a name matching a “banned” word.

Diagram that shows increasing attack difficulty when complexity is increasing and capability is decreasing.[24]

Variant #1a: Targeted misclassification

In this case attackers generate a sample that is not in the input class of the target classifier but gets classified by the model as that particular input class. The adversarial sample can appear like random noise to human eyes but attackers have some knowledge of the target machine learning system to generate a white noise that is not random but is exploiting some specific aspects of the target model. The adversary gives an input sample that is not a legitimate sample, but the target system classifies it as a legitimate class.

Examples

A diagram showing that a photo of targeted noise is incorrectly classified by an image classifier resulting in a photo of a bus.[6]

Mitigations

These approaches can make machine learning models more resilient to adversarial attacks because fooling this two-layer cognition system requires not only attacking the original model but also ensuring that the attribution generated for the adversarial example is similar to the original examples. Both the systems must be simultaneously compromised for a successful adversarial attack.

Traditional Parallels

Remote Elevation of Privilege since attacker is now in control of your model

Severity

Critical

Variant #1b: Source/Target misclassification

This is characterized as an attempt by an attacker to get a model to return their desired label for a given input. This usually forces a model to return a false positive or false negative. The end result is a subtle takeover of the model’s classification accuracy, whereby an attacker can induce specific bypasses at will.

While this attack has a significant detrimental impact to classification accuracy, it can also be more time-intensive to carry out given that an adversary must not only manipulate the source data so that it is no longer labeled correctly, but also labeled specifically with the desired fraudulent label. These attacks often involve multiple steps/attempts to force misclassification [3]. If the model is susceptible to transfer learning attacks which force targeted misclassification, there may be no discernable attacker traffic footprint as the probing attacks can be carried out offline.

Examples

Forcing benign emails to be classified as spam or causing a malicious example to go undetected. These are also known as model evasion or mimicry attacks.

Mitigations

Reactive/Defensive Detection Actions

Proactive/Protective Actions

A graph showing the change in the slope of the prediction function with adversarial training.

Invest in developing monotonic classification with selection of monotonic features. This ensures that the adversary will not be able to evade the classifier by simply padding features from the negative class [13].

Response Actions

Traditional Parallels

Remote Elevation of Privilege

Severity

Critical

Variant #1c: Random misclassification

This is a special variation where the attacker’s target classification can be anything other than the legitimate source classification. The attack generally involves injection of noise randomly into the source data being classified to reduce the likelihood of the correct classification being used in the future [3].

Examples

Two photos of a cat. One photo is classified as a tabby cat. After adversarial perturbation, the other photo is classified as guacamole.

Mitigations

Same as Variant 1a.

Traditional Parallels

Non-persistent denial of service

Severity

Important

Variant #1d: Confidence Reduction

An attacker can craft inputs to reduce the confidence level of correct classification, especially in high-consequence scenarios. This can also take the form of a large number of false positives meant to overwhelm administrators or monitoring systems with fraudulent alerts indistinguishable from legitimate alerts [3].

Examples

Two photos of a stop sign. The photo on the left shows a confidence level of 96 percent. After adversarial perturbation, the photo on the right shows a confidence level of 13 percent.

Mitigations
Traditional Parallels

Non-persistent denial of service

Severity

Important

#2a Targeted Data Poisoning

Description

The goal of the attacker is to contaminate the machine model generated_in the training phase_, so that predictions on new data will be modified in the testing phase[1]. In targeted poisoning attacks, the attacker wants to misclassify specific examples to cause specific actions to be taken or omitted.

Examples

Submitting AV software as malware to force its misclassification as malicious and eliminate the use of targeted AV software on client systems.

Mitigations
Traditional Parallels

Trojaned host whereby attacker persists on the network. Training or config data is compromised and being ingested/trusted for model creation.

Severity

Critical

#2b Indiscriminate Data Poisoning

Description

Goal is to ruin the quality/integrity of the data set being attacked. Many datasets are public/untrusted/uncurated, so this creates additional concerns around the ability to spot such data integrity violations in the first place. Training on unknowingly compromised data is a garbage-in/garbage-out situation. Once detected, triage needs to determine the extent of data that has been breached and quarantine/retrain.

Examples

A company scrapes a well-known and trusted website for oil futures data to train their models. The data provider’s website is subsequently compromised via SQL Injection attack. The attacker can poison the dataset at will and the model being trained has no notion that the data is tainted.

Mitigations

Same as variant 2a.

Traditional Parallels

Authenticated Denial of service against a high-value asset

Severity

Important

#3 Model Inversion Attacks

Description

The private features used in machine learning models can be recovered [1]. This includes reconstructing private training data that the attacker does not have access to. Also known as hill climbing attacks in the biometric community [16, 17] This is accomplished by finding the input which maximizes the confidence level returned, subject to the classification matching the target [4].

Examples

Two images of a person. One image is blurry and the other image is clear.[4]

Mitigations
Traditional Parallels

Targeted, covert Information Disclosure

Severity

This defaults to important per the standard SDL bug bar, but sensitive or personally identifiable data being extracted would raise this to critical.

#4 Membership Inference Attack

Description

The attacker can determine whether a given data record was part of the model’s training dataset or not[1]. Researchers were able to predict a patient’s main procedure (e.g: Surgery the patient went through) based on the attributes (e.g: age, gender, hospital) [1].

An illustration showing the complexity of a membership inference attack. Arrows show the flow and relationship between training data prediction data.[12]

Mitigations

Research papers demonstrating the viability of this attack indicate Differential Privacy [4, 9] would be an effective mitigation. This is still a nascent field at Microsoft and AETHER Security Engineering recommends building expertise with research investments in this space. This research would need to enumerate Differential Privacy capabilities and evaluate their practical effectiveness as mitigations, then design ways for these defenses to be inherited transparently on our online services platforms, similar to how compiling code in Visual Studio gives you on-by-default security protections which are transparent to the developer and users.

The usage of neuron dropout and model stacking can be effective mitigations to an extent. Using neuron dropout not only increases resilience of a neural net to this attack, but also increases model performance [4].

Traditional Parallels

Data Privacy. Inferences are being made about a data point’s inclusion in the training set but the training data itself is not being disclosed

Severity

This is a privacy issue, not a security issue. It is addressed in threat modeling guidance because the domains overlap, but any response here would be driven by Privacy, not Security.

#5 Model Stealing

Description

The attackers recreate the underlying model by legitimately querying the model. The functionality of the new model is same as that of the underlying model[1]. Once the model is recreated, it can be inverted to recover feature information or make inferences on training data.

Examples

In settings where an ML model serves to detect adversarial behavior, such as identification of spam, malware classification, and network anomaly detection, model extraction can facilitate evasion attacks [7].

Mitigations

Proactive/Protective Actions

Traditional Parallels

Unauthenticated, read-only tampering of system data, targeted high-value information disclosure?

Severity

Important in security-sensitive models, Moderate otherwise

#6 Neural Net Reprogramming

Description

By means of a specially crafted query from an adversary, Machine learning systems can be reprogrammed to a task that deviates from the creator’s original intent [1].

Examples

Weak access controls on a facial recognition API enabling 3rdparties to incorporate into apps designed to harm Microsoft customers, such as a deep fakes generator.

Mitigations
Traditional Parallels

This is an abuse scenario. You're less likely to open a security incident on this than you are to simply disable the offender’s account.

Severity

Important to Critical

#7 Adversarial Example in the Physical domain (bits->atoms)

Description

An adversarial example is an input/query from a malicious entity sent with the sole aim of misleading the machine learning system [1]

Examples

These examples can manifest in the physical domain, like a self-driving car being tricked into running a stop sign because of a certain color of light (the adversarial input) being shone on the stop sign, forcing the image recognition system to no longer see the stop sign as a stop sign.

Traditional Parallels

Elevation of Privilege, remote code execution

Mitigations

These attacks manifest themselves because issues in the machine learning layer (the data & algorithm layer below AI-driven decisionmaking) were not mitigated. As with any other software *or* physical system, the layer below the target can always be attacked through traditional vectors. Because of this, traditional security practices are more important than ever, especially with the layer of unmitigated vulnerabilities (the data/algo layer) being used between AI and traditional software.

Severity

Critical

#8 Malicious ML providers who can recover training data

Description

A malicious provider presents a backdoored algorithm, wherein the private training data is recovered. They were able to reconstruct faces and texts, given the model alone.

Traditional Parallels

Targeted information disclosure

Mitigations

Research papers demonstrating the viability of this attack indicate Homomorphic Encryption would be an effective mitigation. This is an area with little current investment at Microsoft and AETHER Security Engineering recommends building expertise with research investments in this space. This research would need to enumerate Homomorphic Encryption tenets and evaluate their practical effectiveness as mitigations in the face of malicious ML-as-a-Service providers.

Severity

Important if data is PII, Moderate otherwise

#9 Attacking the ML Supply Chain

Description

Owing to large resources (data + computation) required to train algorithms, the current practice is to reuse models trained by large corporations and modify them slightly for task at hand (e.g: ResNet is a popular image recognition model from Microsoft). These models are curated in a Model Zoo (Caffe hosts popular image recognition models). In this attack, the adversary attacks the models hosted in Caffe, thereby poisoning the well for anyone else. [1]

Traditional Parallels
Mitigations
Severity

Critical

#10 Backdoor Machine Learning

Description

The training process is outsourced to a malicious 3rd party who tampers with training data and delivered a trojaned model which forces targeted mis-classifications, such as classifying a certain virus as non-malicious[1]. This is a risk in ML-as-a-Service model-generation scenarios.

An example showing how mis-classifications can adversely affect training data. One photo is a correctly classified stop sign. After poisoning, the second photo is labeled as a speed limit sign.[12]

Traditional Parallels
Mitigations
Reactive/Defensive Detection Actions
Proactive/Protective Actions
Response Actions
Severity

Critical

#11 Exploit software dependencies of the ML system

Description

In this attack, the attacker does NOT manipulate the algorithms. Instead, exploits software vulnerabilities such as buffer overflows or cross-site scripting[1]. It is still easier to compromise software layers beneath AI/ML than attack the learning layer directly, so traditional security threat mitigation practices detailed in the Security Development Lifecycle are essential.

Traditional Parallels
Mitigations

Work with your security team to follow applicable Security Development Lifecycle/Operational Security Assurance best practices.

Severity

Variable; Up to Critical depending on the type of traditional software vulnerability.

Bibliography

[1] Failure Modes in Machine Learning, Ram Shankar Siva Kumar, David O’Brien, Kendra Albert, Salome Viljoen, and Jeffrey Snover,https://learn.microsoft.com/security/failure-modes-in-machine-learning

[2] AETHER Security Engineering Workstream, Data Provenance/Lineage v-team

[3] Adversarial Examples in Deep Learning: Characterization and Divergence, Wei, et al, https://arxiv.org/pdf/1807.00051.pdf

[4] ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models, Salem, et al,https://arxiv.org/pdf/1806.01246v2.pdf

[5] M. Fredrikson, S. Jha, and T. Ristenpart, “Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures,” in Proceedings of the 2015 ACM SIGSAC Conference on Computer and Communications Security (CCS).

[6] Nicolas Papernot & Patrick McDaniel- Adversarial Examples in Machine Learning AIWTB 2017

[7] Stealing Machine Learning Models via Prediction APIs, Florian Tramèr, École Polytechnique Fédérale de Lausanne (EPFL); Fan Zhang, Cornell University; Ari Juels, Cornell Tech; Michael K. Reiter, The University of North Carolina at Chapel Hill; Thomas Ristenpart, Cornell Tech

[8] The Space of Transferable Adversarial Examples, Florian Tramèr , Nicolas Papernot , Ian Goodfellow , Dan Boneh , and Patrick McDaniel

[9] Understanding Membership Inferences on Well-Generalized Learning Models Yunhui Long1 , Vincent Bindschaedler1 , Lei Wang2 , Diyue Bu2 , Xiaofeng Wang2 , Haixu Tang2 , Carl A. Gunter1 , and Kai Chen3,4

[10] Simon-Gabriel et al., Adversarial vulnerability of neural networks increases with input dimension, ArXiv 2018;

[11] Lyu et al., A unified gradient regularization family for adversarial examples, ICDM 2015

[12] Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning - NeCS 2019 Battista Biggioa, Fabio Roli

[13] Adversarially Robust Malware Detection UsingMonotonic Classification Inigo Incer et al.

[14] Battista Biggio, Igino Corona, Giorgio Fumera, Giorgio Giacinto, and Fabio Roli. Bagging Classifiers for Fighting Poisoning Attacks in Adversarial Classification Tasks

[15] An Improved Reject on Negative Impact Defense Hongjiang Li and Patrick P.K. Chan

[16] Adler. Vulnerabilities in biometric encryption systems. 5th Int’l Conf. AVBPA, 2005

[17] Galbally, McCool, Fierrez, Marcel, Ortega-Garcia. On the vulnerability of face verification systems to hill-climbing attacks. Patt. Rec., 2010

[18] Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. 18-21 February.

[19] Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training - Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, Somesh Jha

[20] Attribution-driven Causal Analysis for Detection of Adversarial Examples, Susmit Jha, Sunny Raj, Steven Fernandes, Sumit Kumar Jha, Somesh Jha, Gunjan Verma, Brian Jalaian, Ananthram Swami

[21] Robust Linear Regression Against Training Data Poisoning – Chang Liu et al.

[22] Feature Denoising for Improving Adversarial Robustness, Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He

[23] Certified Defenses against Adversarial Examples - Aditi Raghunathan, Jacob Steinhardt, Percy Liang