Towards Adversarial Attack Resistant Deep Neural Networks (original) (raw)
Related papers
Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks
2018
It has been shown that deep neural network (DNN) based classifiers are vulnerable to human-imperceptive adversarial perturbations which can cause DNN classifiers to output wrong predictions with high confidence. We propose an unsupervised learning approach to detect adversarial inputs without any knowledge of attackers. Our approach tries to capture the intrinsic properties of a DNN classifier and uses them to detect adversarial inputs. The intrinsic properties used in this study are the output distributions of the hidden neurons in a DNN classifier presented with natural images. Our approach can be easily applied to any DNN classifiers or combined with other defense strategy to improve robustness. Experimental results show that our approach demonstrates state-of-the-art robustness in defending black-box and gray-box attacks.
Defending Against Adversarial Samples Without Security through Obscurity
2018 IEEE International Conference on Data Mining (ICDM), 2018
It has been recently shown that deep neural networks (DNNs) are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points that change "fool" the original DNN model, forcing it to misclassify previously correctly classified samples with high confidence. Many believe addressing this flaw is essential for DNNs to be used in critical applications such as cyber security. Previous work has shown that learning algorithms that enhance the robustness of DNN models all use the tactic of "security through obscurity". This means that security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we ...
A Robust-Based Framework towards Resisting Adversarial Attack on Deep Learning Models
IJSES, 2021
Adversarial attack is a type of attack executed by an attacker in other to confuse a deep learning model to falsely classify a wrong input data as the correct data. This attack is being executed in two ways. The first one is the Poisoning attack which is being generated during training of a deep learning model. And the second one is the Evasion attack. In the evasion assault, the assaults' is being done on the test dataset. An evasion assault happens when the computer network is taken care of an "adversarial model", a painstakingly perturbed info that looks and feels precisely equivalent to its untampered duplicate to a human however that totally loses the classifier that. This system presents a robust based model towards the resistance of adversary assaults on deep learning models. The system presents two models using convolutional neural network algorithm. This model was trained on a Modified National Institute of Standards and Technology dataset (MNIST). An adversary (evasion) attacks was generated to in other to fools this models to misclassify result, therefore, seeing the wrong input data to be the right one. This adversarial examples was generated using a state-of-the-art library in python. The generated adversarial examples was being generated on the test data, in which the first model fails in resisting the attack, while the second model, which is our robust model resisted the adversarial attack on a good number of accuracy when tested for the first 100 images.
Resisting Deep Learning Models Against Adversarial Attack Transferability via Feature Randomization
arXiv (Cornell University), 2022
In the past decades, the rise of artificial intelligence has given us the capabilities to solve the most challenging problems in our day-today lives, such as cancer prediction and autonomous navigation. However, these applications might not be reliable if not secured against adversarial attacks. In addition, recent works demonstrated that some adversarial examples are transferable across different models. Therefore, it is crucial to avoid such transferability via robust models that resist adversarial manipulations. In this paper, we propose a feature randomization-based approach that resists eight adversarial attacks targeting deep learning models in the testing phase. Our novel approach consists of changing the training strategy in the target network classifier and selecting random feature samples. We consider the attacker with a Limited-Knowledge and Semi-Knowledge conditions to undertake the most prevalent types of adversarial attacks. We evaluate the robustness of our approach using the well-known UNSW-NB15 datasets that include realistic and synthetic attacks. Afterward, we demonstrate that our strategy outperforms the existing state-of-the-art approach, such as the Most Powerful Attack, which consists of fine-tuning the network model against specific adversarial attacks. Finally, our experimental results show that our methodology can secure the target network and resists adversarial attack transferability by over 60%.
Improving robustness of neural networks against adversarial examples
2020
The main goal of this work is to design and implement the framework that yields robust neural network model against whatever adversarial attack, while result models accuracy is not significantly lower comparing to naturally trained model. Our approach is to minimize maximization the loss function of the target model. Related work and our experiments lead us to the usage of Projected gradient descent method as a reference attack, therefore, we train against data generated by PGD. As a result, using the framework we can reach accuracy more than 90% against sophisticated adversarial attacks on MNIST dataset. The greatest contribution of this work is an implementation of adversarial attacks and defences against them because there misses any public implementation.
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images. The robustness of existing defenses suffers greatly under white-box attack settings, where an adversary has full knowledge about the network and can iterate several times to find strong perturbations. We observe that the main reason for the existence of such perturbations is the close proximity of different class samples in the learned feature space. This allows model decisions to be totally changed by adding an imperceptible perturbation in the inputs. To counter this, we propose to class-wise disentangle the intermediate feature representations of deep networks. Specifically, we force the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes. In this manner, the network is forced to learn distinct and distant decision regions for each class. We observe that this simple constraint on the features greatly enhances the robustness of learned models, even against the strongest white-box attacks, without degrading the classification performance on clean images. We report extensive evaluations in both black-box and whitebox attack scenarios and show significant gains in comparison to state-of-the art defenses 1 .
Thwarting finite difference adversarial attacks with output randomization
arXiv (Cornell University), 2019
Adversarial examples pose a threat to deep neural network models in a variety of scenarios, from settings where the adversary has complete knowledge of the model and to the opposite "black box" setting. Black box attacks are particularly threatening as the adversary only needs access to the input and output of the model. Defending against black box adversarial example generation attacks is paramount as currently proposed defenses are not effective. Since these types of attacks rely on repeated queries to the model to estimate gradients over input dimensions, we investigate the use of randomization to thwart such adversaries from successfully creating adversarial examples. Randomization applied to the output of the deep neural network model has the potential to confuse potential attackers, however this introduces a tradeoff between accuracy and robustness. We show that for certain types of randomization, we can bound the probability of introducing errors by carefully setting distributional parameters. For the particular case of finite difference black box attacks, we quantify the error introduced by the defense in the finite difference estimate of the gradient. Lastly, we show empirically that the defense can thwart two adaptive black box adversarial attack algorithms. Preprint. Under review.
Improving Robustness to Adversarial Examples by Encouraging Discriminative Features
2019
Deep neural networks (DNNs) have achieved state-of-theart results in various pattern recognition tasks. However, they perform poorly on out-of-distribution adversarial examples i.e. inputs that are specifically crafted by an adversary to cause DNNs to misbehave, questioning the security and reliability of applications. In this paper, we hypothesize interclass and intra-class feature variances to be one of the reasons behind the existence of adversarial examples. Additionally, learning low intra-class and high inter-class feature variance help classifiers learn decision boundaries that are more compact and leave less inter-class low-probability "pockets" in the feature space, i.e. less room for adversarial perturbations. We achieve this by imposing a center loss [1] in addition to the regular softmax cross-entropy loss while training a DNN classifier. Intuitively, the center loss encourages DNNs to simultaneously learn a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers. Our results on state-of-the-art architectures tested on MNIST, CIFAR-10, and CIFAR-100 datasets confirm our hypothesis and highlight the importance of discriminative features in the existence of adversarial examples.
Adversarial Examples in Deep Learning: Characterization and Divergence
ArXiv, 2018
The burgeoning success of deep learning has raised the security and privacy concerns as more and more tasks are accompanied with sensitive data. Adversarial attacks in deep learning have emerged as one of the dominating security threat to a range of mission-critical deep learning systems and applications. This paper takes a holistic and principled approach to perform statistical characterization of adversarial examples in deep learning. We provide a general formulation of adversarial examples and elaborate on the basic principle for adversarial attack algorithm design. We introduce easy and hard categorization of adversarial attacks to analyze the effectiveness of adversarial examples in terms of attack success rate, degree of change in adversarial perturbation, average entropy of prediction qualities, and fraction of adversarial examples that lead to successful attacks. We conduct extensive experimental study on adversarial behavior in easy and hard attacks under deep learning mode...
Survey of Adversarial Attacks in Deep Learning Models
IRJET, 2022
Despite recent breakthroughs in a wide range of applications, machine learning models, particularly deep neural networks, have been demonstrated to be sensitive to adversarial assaults. Looking at these intelligent models from a security standpoint is critical; if the person/organization is uninformed, they must retrain the model and address the errors, which is both costly and time consuming. Attackers introduce carefully engineered perturbations into input, which are practically undetectable to humans but can lead models to make incorrect predictions. Hostile defense strategies are techniques for protecting models against adversarial input are called adversarial defense methods. These attacks can be performed on a range of models trained on images, text, time-series data. In our paper will discuss different kinds of attacks like White-Box attacks, Black-Box attacks etc on various models and also make them robust by using several defense approaches like adversarial training, Adversarial Detection, Input Denoising etc.