Adversarial Attacks and Defences: A Survey (original) (raw)

Survey of Adversarial Attacks in Deep Learning Models

IRJET, 2022

Despite recent breakthroughs in a wide range of applications, machine learning models, particularly deep neural networks, have been demonstrated to be sensitive to adversarial assaults. Looking at these intelligent models from a security standpoint is critical; if the person/organization is uninformed, they must retrain the model and address the errors, which is both costly and time consuming. Attackers introduce carefully engineered perturbations into input, which are practically undetectable to humans but can lead models to make incorrect predictions. Hostile defense strategies are techniques for protecting models against adversarial input are called adversarial defense methods. These attacks can be performed on a range of models trained on images, text, time-series data. In our paper will discuss different kinds of attacks like White-Box attacks, Black-Box attacks etc on various models and also make them robust by using several defense approaches like adversarial training, Adversarial Detection, Input Denoising etc.

Adversarial Deep Learning: A Survey on Adversarial Attacks and Defense Mechanisms on Image Classification

IEEE Access

The popularity of adapting deep neural networks (DNNs) in solving hard problems has increased substantially. Specifically, in the field of computer vision, DNNs are becoming a core element in developing many image and video classification and recognition applications. However, DNNs are vulnerable to adversarial attacks, in which, given a well-trained image classification model, a malicious input can be crafted by adding mere perturbations to misclassify the image. This phenomena raise many security concerns in utilizing DNNs in critical life applications which attracts the attention of academic and industry researchers. As a result, multiple studies have proposed discussing novel attacks that can compromise the integrity of state-of-the-art image classification neural networks. The raise of these attacks urges the research community to explore countermeasure methods to mitigate these attacks and increase the reliability of adapting DDNs in different major applications. Hence, various defense strategies have been proposed to protect DNNs against adversarial attacks. In this paper, we thoroughly review the most recent and state-of-the-art adversarial attack methods by providing an in-depth analysis and explanation of the working process of these attacks. In our review, we focus on explaining the mathematical concepts and terminologies of the adversarial attacks, which provide a comprehensive and solid survey to the research community. Additionally, we provide a comprehensive review of the most recent defense mechanisms and discuss their effectiveness in defending DNNs against adversarial attacks. Finally, we highlight the current challenges and open issues in this field as well as future research directions. INDEX TERMS Deep neural networks, artificial intelligence, adversarial examples, adversarial perturbations. I. INTRODUCTION 20 Deep learning makes a significant breakthrough in providing 21 solutions to many hard problems that cannot be solved using 22 traditional machine learning algorithms. Examples include, 23 but are not limited to, image classification, text translation, 24 and speech recognition. Due to the advancement of deep 25 learning neural networks and the availability of powerful 26 computational resources, deep learning is becoming the pri-27 The associate editor coordinating the review of this manuscript and approving it for publication was Mauro Tucci. explore complex tasks such as X-ray analysis [9], predictive 41 maintenance [10], and crop yield prediction [11]. 42 Despite the evident ability in solving many sophisticated 43 problems with high accuracy, Szegedy et al. [12] demon-44 strated that deep neural networks are susceptible to adver-45 sarial attacks. As depicted in Fig. 1, an adversarial example 46 can be generated by adding small perturbations to an image 47 to fool the deep neural networks and reduce their accuracy. 48 Their finding triggered the interest of researchers to study 49 the security of deep neural networks. As a result, several 50 adversarial attacks have been proposed in the literature that 51 show different security vulnerabilities that can be exploited 52 by an adversary to compromise a deep learning system. For 53 example, Su et al. [13] showed that changing one pixel on an 54 image can fool a deep learning model. Furthermore, different 55 research works have shown the ability to generate universal 56 perturbations that can fool any neural network [14]. 57 The inherited weaknesses of DNN models against adver-58 sarial attacks raise many security concerns especially for 59 critical applications such as the robustness of deep learning 60 algorithms used for autonomous vehicles [15]. Hence, differ-61 ent studies propose various countermeasure methods against 62 adversarial attacks. Examples include the modification to the 63 deep neural network, adding to the neural network, and many 64 others are explained in this literature.

Security Matters: A Survey on Adversarial Machine Learning

2018

Adversarial machine learning is a fast growing research area, which considers the scenarios when machine learning systems may face potential adversarial attackers, who intentionally synthesize input data to make a well-trained model to make mistake. It always involves a defending side, usually a classifier, and an attacking side that aims to cause incorrect output. The earliest studies on the adversarial examples for machine learning algorithms start from the information security area, which considers a much wider varieties of attacking methods. But recent research focus that popularized by the deep learning community places strong emphasis on how the "imperceivable" perturbations on the normal inputs may cause dramatic mistakes by the deep learning with supposed super-human accuracy. This paper serves to give a comprehensive introduction to a range of aspects of the adversarial deep learning topic, including its foundations, typical attacking and defending strategies, and...

A Robust-Based Framework towards Resisting Adversarial Attack on Deep Learning Models

IJSES, 2021

Adversarial attack is a type of attack executed by an attacker in other to confuse a deep learning model to falsely classify a wrong input data as the correct data. This attack is being executed in two ways. The first one is the Poisoning attack which is being generated during training of a deep learning model. And the second one is the Evasion attack. In the evasion assault, the assaults' is being done on the test dataset. An evasion assault happens when the computer network is taken care of an "adversarial model", a painstakingly perturbed info that looks and feels precisely equivalent to its untampered duplicate to a human however that totally loses the classifier that. This system presents a robust based model towards the resistance of adversary assaults on deep learning models. The system presents two models using convolutional neural network algorithm. This model was trained on a Modified National Institute of Standards and Technology dataset (MNIST). An adversary (evasion) attacks was generated to in other to fools this models to misclassify result, therefore, seeing the wrong input data to be the right one. This adversarial examples was generated using a state-of-the-art library in python. The generated adversarial examples was being generated on the test data, in which the first model fails in resisting the attack, while the second model, which is our robust model resisted the adversarial attack on a good number of accuracy when tested for the first 100 images.

Adversarial Examples in Deep Learning: Characterization and Divergence

ArXiv, 2018

The burgeoning success of deep learning has raised the security and privacy concerns as more and more tasks are accompanied with sensitive data. Adversarial attacks in deep learning have emerged as one of the dominating security threat to a range of mission-critical deep learning systems and applications. This paper takes a holistic and principled approach to perform statistical characterization of adversarial examples in deep learning. We provide a general formulation of adversarial examples and elaborate on the basic principle for adversarial attack algorithm design. We introduce easy and hard categorization of adversarial attacks to analyze the effectiveness of adversarial examples in terms of attack success rate, degree of change in adversarial perturbation, average entropy of prediction qualities, and fraction of adversarial examples that lead to successful attacks. We conduct extensive experimental study on adversarial behavior in easy and hard attacks under deep learning mode...

A Tutorial on Adversarial Learning Attacks and Countermeasures

2022

Machine learning algorithms are used to construct a mathematical model for a system based on training data. Such a model is capable of making highly accurate predictions without being explicitly programmed to do so. These techniques have a great many applications in all areas of the modern digital economy and artificial intelligence. More importantly, these methods are essential for a rapidly increasing number of safety-critical applications such as autonomous vehicles and intelligent defense systems. However, emerging adversarial learning attacks pose a serious security threat that greatly undermines further such systems. The latter are classified into four types, evasion (manipulating data to avoid detection), poisoning (injection malicious training samples to disrupt retraining), model stealing (extraction), and inference (leveraging over-generalization on training data). Understanding this type of attacks is a crucial first step for the development of effective countermeasures. ...

Detecting adversarial example attacks to deep neural networks

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks. CCS CONCEPTS • Security and privacy → Intrusion/anomaly detection and malware mitigation; • Computing methodologies → Neural networks;

Towards Adversarial Attack Resistant Deep Neural Networks

2020

Recent publications have shown that neural network based classifiers are vulnerable to adversarial inputs that are virtually indistinguishable from normal data, constructed explicitly for the purpose of forcing misclassification. In this paper, we present several defenses to counter these threats. First, we observe that most adversarial attacks succeed by mounting gradient ascent on the confidence returned by the model, which allows adversary to gain understanding of the classification boundary. Our defenses are based on denying access to the precise classification boundary. Our first defense adds a controlled random noise to the output confidence levels, which prevents an adversary from converging in their numerical approximation attack. Our next defense is based on the observation that by varying the order of the training, often we arrive at models which offer the same classification accuracy, yet they are different numerically. An ensemble of such models allows us to randomly swi...

Analysing Adversarial Examples for Deep Learning

2021

The aim of this work is to investigate adversarial examples and look for commonalities and disparities between different adversarial attacks and attacked classifier model behaviours. The research focuses on untargeted, gradient-based attacks. The experiment uses 16 attacks on 4 models and 1000 images. This resulted in 64,000 adversarial examples. The resulting classification predictions of the adversarial examples (adversarial labels) are analysed. It is found that light-weight neural network classifiers are more suspectable to attacks compared to the models with a larger or more complex architecture. It is also observed that similar adversarial attacks against a light-weight model often result in the same adversarial label. Moreover, the attacked models have more influence over the resulting adversarial label as compared to the adversarial attack algorithm itself. These finding are helpful in understanding the intriguing vulnerability of deep learning to adversarial examples.

Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks

2018

It has been shown that deep neural network (DNN) based classifiers are vulnerable to human-imperceptive adversarial perturbations which can cause DNN classifiers to output wrong predictions with high confidence. We propose an unsupervised learning approach to detect adversarial inputs without any knowledge of attackers. Our approach tries to capture the intrinsic properties of a DNN classifier and uses them to detect adversarial inputs. The intrinsic properties used in this study are the output distributions of the hidden neurons in a DNN classifier presented with natural images. Our approach can be easily applied to any DNN classifiers or combined with other defense strategy to improve robustness. Experimental results show that our approach demonstrates state-of-the-art robustness in defending black-box and gray-box attacks.