What is data poisoning (AI poisoning) and how does it work? (original) (raw)

Data or AI poisoning attacks are deliberate attempts to manipulate the training data of artificial intelligence and machine learning (ML) models to corrupt their behavior and elicit skewed, biased or harmful outputs.

AI tools have seen increasingly widespread adoption since the public release of ChatGPT. Many of these systems rely on ML models to function properly. Knowing this, threat actors employ various attack techniques to infiltrate AI systems through their ML models. One of the most significant threats to ML models is data poisoning.

Data poisoning attacks pose a significant threat to the integrity and reliability of AI and ML systems. A successful data poisoning attack can cause undesirable behavior, biased outputs or complete model failure. As the adoption of AI systems continues to grow across all industries, it is critical to implement mitigation strategies and countermeasures to safeguard these models from malicious data manipulation.

The role of data in model training

During training, ML models need to access large volumes of data from different sources, known as training data. Common sources for training data include the following:

A data poisoning attack occurs when threat actors inject malicious or corrupted data into these training data sets, aiming to cause the AI model to produce inaccurate results or degrade its overall performance.

Data poisoning attack types

Malicious actors use a variety of methods to execute data poisoning attacks. The most common approaches include the following.

Mislabeling attack

In this type of attack, a threat actor deliberately mislabels portions of the AI model's training data set, leading the model to learn incorrect patterns and thus give inaccurate results after deployment. For example, feeding a model numerous images of horses incorrectly labeled as cars during the training phase might teach the AI system to mistakenly recognize horses as cars after deployment.

Data injection

In a data injection attack, threat actors inject malicious data samples into ML training data sets to make the AI system behave according to the attacker's objectives. For example, introducing specially crafted data samples into a banking system's training data could bias it against specific demographics during loan processing.

Data manipulation

Data manipulation involves altering data within an ML model's training set to cause the model to misclassify data or behave in a predefined malicious manner in response to specific inputs. Techniques for manipulating training data include the following:

The end goal of a data manipulation attack is to exploit ML security vulnerabilities, resulting in biased or harmful outputs.

Backdoors

Threat actors can also plant a hidden vulnerability -- known as a backdoor -- in the training data or the ML algorithm itself. The backdoor is then triggered automatically when certain conditions are met. Typically, for AI model backdoors, this means that the model produces malicious results aligned with the attacker's intentions when the attacker feeds it specific input.

Backdoor attacks are a severe risk in AI and ML systems, as an affected model will still appear to behave normally after deployment and might not show signs of being compromised. For example, an autonomous vehicle system containing a compromised ML model with a hidden backdoor might be manipulated to ignore stop signs when certain conditions are met, causing accidents and corrupting research data.

ML supply chain attacks

ML models often rely on third-party data sources and tooling. These external components can introduce security vulnerabilities, such as backdoors, into the AI system. Supply chain attacks are not limited to ML training models; they can occur at any stage of the ML system development lifecycle.

Insider attacks

Insider attacks are perpetrated by individuals within an organization -- such as employees or contractors -- who misuse their authorized access privileges to the ML model's training data, algorithms and physical infrastructure. These attackers have the ability to directly manipulate the model's data and architecture in different ways to degrade its performance or bias its results. Insider attacks are particularly dangerous and difficult to defend against because internal actors can often bypass external security controls that would stop an outside hacker.

Direct vs. indirect data poisoning attacks

Data poisoning attack objectives can be broadly categorized into two types based on their objectives: direct and indirect.

Direct attacks

Direct data poisoning attacks, also known as targeted attacks, occur when threat actors manipulate the ML model to behave in a specific way for a particular targeted input, while leaving the model's overall performance unaffected. For example, threat actors might inject carefully crafted samples into the training data of a malware detection tool to cause the ML system to misclassify malicious files as benign.

Indirect attack

In contrast to direct attacks, indirect attacks are nontargeted attacks that aim to affect the overall performance of the ML model, not just a specific function or feature. For example, threat actors might inject random noise into the training data of an image classification tool by inserting random pixels into a subset of the images the model trains on. Adding this type of noise impairs the model's ability to generalize efficiently from its training data, which degrades the overall performance of the ML model and makes it less reliable in real-world settings.

Mitigation strategies for data poisoning attacks

To effectively mitigate data poisoning attacks, organizations can implement a layered defense strategy that contains both security best practices and access control enforcement. Specific data poisoning mitigation techniques include the following:

This was last updated in November 2024

Continue Reading About What is data poisoning (AI poisoning) and how does it work?

Dig Deeper on AI technologies