Attack as Defense: Characterizing Adversarial Examples using Robustness (original) (raw)
Related papers
Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks
2018
It has been shown that deep neural network (DNN) based classifiers are vulnerable to human-imperceptive adversarial perturbations which can cause DNN classifiers to output wrong predictions with high confidence. We propose an unsupervised learning approach to detect adversarial inputs without any knowledge of attackers. Our approach tries to capture the intrinsic properties of a DNN classifier and uses them to detect adversarial inputs. The intrinsic properties used in this study are the output distributions of the hidden neurons in a DNN classifier presented with natural images. Our approach can be easily applied to any DNN classifiers or combined with other defense strategy to improve robustness. Experimental results show that our approach demonstrates state-of-the-art robustness in defending black-box and gray-box attacks.
A Robust-Based Framework towards Resisting Adversarial Attack on Deep Learning Models
IJSES, 2021
Adversarial attack is a type of attack executed by an attacker in other to confuse a deep learning model to falsely classify a wrong input data as the correct data. This attack is being executed in two ways. The first one is the Poisoning attack which is being generated during training of a deep learning model. And the second one is the Evasion attack. In the evasion assault, the assaults' is being done on the test dataset. An evasion assault happens when the computer network is taken care of an "adversarial model", a painstakingly perturbed info that looks and feels precisely equivalent to its untampered duplicate to a human however that totally loses the classifier that. This system presents a robust based model towards the resistance of adversary assaults on deep learning models. The system presents two models using convolutional neural network algorithm. This model was trained on a Modified National Institute of Standards and Technology dataset (MNIST). An adversary (evasion) attacks was generated to in other to fools this models to misclassify result, therefore, seeing the wrong input data to be the right one. This adversarial examples was generated using a state-of-the-art library in python. The generated adversarial examples was being generated on the test data, in which the first model fails in resisting the attack, while the second model, which is our robust model resisted the adversarial attack on a good number of accuracy when tested for the first 100 images.
Improving robustness of neural networks against adversarial examples
2020
The main goal of this work is to design and implement the framework that yields robust neural network model against whatever adversarial attack, while result models accuracy is not significantly lower comparing to naturally trained model. Our approach is to minimize maximization the loss function of the target model. Related work and our experiments lead us to the usage of Projected gradient descent method as a reference attack, therefore, we train against data generated by PGD. As a result, using the framework we can reach accuracy more than 90% against sophisticated adversarial attacks on MNIST dataset. The greatest contribution of this work is an implementation of adversarial attacks and defences against them because there misses any public implementation.
Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack
Cornell University - arXiv, 2022
The AutoAttack (AA) has been the most reliable method to evaluate adversarial robustness when considerable computational resources are available. However, the high computational cost (e.g., 100 times more than that of the project gradient descent (PGD-20) attack) makes AA infeasible for practitioners with limited computational resources, and also hinders applications of AA in the adversarial training (AT). In this paper, we propose a novel method, minimum-margin (MM) attack, to fast and reliably evaluate adversarial robustness. Compared with AA, our method achieves comparable performance but only costs 3% of the computational time in extensive experiments. The reliability of our method lies in that we evaluate the quality of adversarial examples using the margin between two targets that can precisely identify the most adversarial example. The computational efficiency of our method lies in an effective Sequential TArget Ranking Selection (STARS) method, ensuring that the cost of the MM attack is independent of the number of classes. As a better benchmark, the MM attack opens a new way for evaluating adversarial robustness and provides a feasible and reliable way to generate high-quality adversarial examples in AT.
Adversarial Attacks on ML Defense Models Competition
ArXiv, 2021
Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years. However, the progress of building more robust models is usually hampered by the incomplete or incorrect robustness evaluation. To accelerate the research on reliable evaluation of adversarial robustness of the current defense models in image classification, the TSAIL group at Tsinghua University and the Alibaba Security group organized this competition along with a CVPR 2021 workshop on adversarial machine learning (https://aisecureworkshop.github.io/amlcvpr2021/). The purpose of this competition is to motivate novel attack algorithms to evaluate adversarial robustness more effectively and reliably. The participants were encouraged to develop stronger white-box attack algorithms to find the worst-case robustness of different defenses. This competition was conducted on an adversarial robustness evaluation p...
Improving Robustness to Adversarial Examples by Encouraging Discriminative Features
2019
Deep neural networks (DNNs) have achieved state-of-theart results in various pattern recognition tasks. However, they perform poorly on out-of-distribution adversarial examples i.e. inputs that are specifically crafted by an adversary to cause DNNs to misbehave, questioning the security and reliability of applications. In this paper, we hypothesize interclass and intra-class feature variances to be one of the reasons behind the existence of adversarial examples. Additionally, learning low intra-class and high inter-class feature variance help classifiers learn decision boundaries that are more compact and leave less inter-class low-probability "pockets" in the feature space, i.e. less room for adversarial perturbations. We achieve this by imposing a center loss [1] in addition to the regular softmax cross-entropy loss while training a DNN classifier. Intuitively, the center loss encourages DNNs to simultaneously learn a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers. Our results on state-of-the-art architectures tested on MNIST, CIFAR-10, and CIFAR-100 datasets confirm our hypothesis and highlight the importance of discriminative features in the existence of adversarial examples.
Analysing Adversarial Examples for Deep Learning
2021
The aim of this work is to investigate adversarial examples and look for commonalities and disparities between different adversarial attacks and attacked classifier model behaviours. The research focuses on untargeted, gradient-based attacks. The experiment uses 16 attacks on 4 models and 1000 images. This resulted in 64,000 adversarial examples. The resulting classification predictions of the adversarial examples (adversarial labels) are analysed. It is found that light-weight neural network classifiers are more suspectable to attacks compared to the models with a larger or more complex architecture. It is also observed that similar adversarial attacks against a light-weight model often result in the same adversarial label. Moreover, the attacked models have more influence over the resulting adversarial label as compared to the adversarial attack algorithm itself. These finding are helpful in understanding the intriguing vulnerability of deep learning to adversarial examples.
Harnessing adversarial examples with a surprisingly simple defense
2020
I introduce a very simple method to defend against adversarial examples. The basic idea is to raise the slope of the ReLU function at the test time. Experiments over MNIST and CIFAR-10 datasets demonstrate the effectiveness of the proposed defense against a number of strong attacks in both untargeted and targeted settings. While perhaps not as effective as the state of the art adversarial defenses, this approach can provide insights to understand and mitigate adversarial attacks. It can also be used in conjunction with other defenses.
ReabsNet: Detecting and Revising Adversarial Examples
2017
Though deep neural network has hit a huge success in recent studies and applica- tions, it still remains vulnerable to adversarial perturbations which are imperceptible to humans. To address this problem, we propose a novel network called ReabsNet to achieve high classification accuracy in the face of various attacks. The approach is to augment an existing classification network with a guardian network to detect if a sample is natural or has been adversarially perturbed. Critically, instead of simply rejecting adversarial examples, we revise them to get their true labels. We exploit the observation that a sample containing adversarial perturbations has a possibility of returning to its true class after revision. We demonstrate that our ReabsNet outperforms the state-of-the-art defense method under various adversarial attacks.
Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
ArXiv, 2021
Evaluating robustness of machine-learning models to adversarial examples is a challenging problem. Many defenses have been shown to provide a false sense of security by causing gradient-based attacks to fail, and they have been broken under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner. In this work, we overcome these limitations by (i) defining a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks, and (ii) proposing specific mitigation strategies within a systematic evaluation protocol. Our extensive experimental analysis shows that the proposed indicators of failure can be used to visualize, debug and improve current adversarial robustness evaluations, providing a first concrete step towards automatizing and syst...