Toward Improving the Robustness of Deep Learning Models via Model Transformation (original) (raw)

Towards Alternative Techniques for Improving Adversarial Robustness: Analysis of Adversarial Training at a Spectrum of Perturbations

Cornell University - arXiv, 2022

Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations and common corruptions in the last few years. Algorithm design of AT and its variants are focused on training models at a specified perturbation strength and only using the feedback from the performance of that-robust model to improve the algorithm. In this work, we focus on models, trained on a spectrum of values. We analyze three perspectives: model performance, intermediate feature precision and convolution filter sensitivity. In each, we identify alternative improvements to AT that otherwise wouldn't have been apparent at a single. Specifically, we find that for a PGD attack at some strength δ, there is an AT model at some slightly larger strength , but no greater, that generalizes best to it. Hence, we propose overdesigning for robustness where we suggest training models at an just above δ. Second, we observe (across various values) that robustness is highly sensitive to the precision of intermediate features and particularly those after the first and second layer. Thus, we propose adding a simple quantization to defenses that improves accuracy on seen and unseen adaptive attacks. Third, we analyze convolution filters of each layer of models at increasing and notice that those of the first and second layer may be solely responsible for amplifying input perturbations. We present our findings and demonstrate our techniques through experiments with ResNet and WideResNet models on the CIFAR-10 and CIFAR-10-C datasets. 1

Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness

arXiv (Cornell University), 2020

While deep neural networks have been achieving stateof-the-art performance across a wide variety of applications, their vulnerability to adversarial attacks limits their widespread deployment for safety-critical applications. Alongside other adversarial defense approaches being investigated, there has been a very recent interest in improving adversarial robustness in deep neural networks through the introduction of perturbations during the training process. However, such methods leverage fixed, pre-defined perturbations and require significant hyperparameter tuning that makes them very difficult to leverage in a general fashion. In this study, we introduce Learn2Perturb, an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. More specifically, we introduce novel perturbation-injection modules that are incorporated at each layer to perturb the feature space and increase uncertainty in the network. This feature perturbation is performed at both the training and the inference stages. Furthermore, inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively. Experimental results on CIFAR-10 and CIFAR-100 datasets show that the proposed Learn2Perturb method can result in deep neural networks which are 4-7% more robust on l ∞ FGSM and PDG adversarial attacks and significantly outperforms the state-of-the-art against l 2 C&W attack and a wide range of well-known black-box attacks.

A Comprehensive Evaluation Framework for Deep Model Robustness

2021

Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications. However, they are vulnerable to adversarial examples, which motivates the adversarial defense. By adopting simple evaluation metrics, most of the current defenses only conduct incomplete evaluations, which are far from providing comprehensive understandings of the limitations of these defenses. Thus, most proposed defenses are quickly shown to be attacked successfully, which result in the “arm race” phenomenon between attack and defense. To mitigate this problem, we establish a model robustness evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics, which could fully evaluate model robustness and provide deep insights into building robust models. With 23 evaluation metrics in total, our framework primarily focuses on the two key factors of adversarial learning (i.e., data and model). Through neuron coverage and data imperceptibility, we use ...

SpecAttack: Specification-Based Adversarial Training for Deep Neural Networks

arXiv (Cornell University), 2021

Deep neural networks (DNNs) are increasingly applied in safety-critical domains, such as self-driving cars, unmanned aircraft, and medical diagnosis. It is of fundamental importance to certify the safety of these DNNs, i.e. that they comply with a formal safety specification. While safety certification tools exactly answer this question, they are of no help in debugging unsafe DNNs, requiring the developer to iteratively verify and modify the DNN until safety is eventually achieved. Hence, a repair technique needs to be developed that can produce a safe DNN automatically. To address this need, we present SpecRepair, a tool that efficiently eliminates counterexamples from a DNN and produces a provably safe DNN without harming its classification accuracy. SpecRepair combines specification-based counterexample search and resumes training of the DNN, penalizing counterexamples and certifying the resulting DNN. We evaluate SpecRepair's effectiveness on the ACAS Xu benchmark, a DNN-based controller for unmanned aircraft, and two image classification benchmarks. The results show that SpecRepair is more successful in producing safe DNNs than comparable methods, has a shorter runtime, and produces safe DNNs while preserving their classification accuracy.

Advanced Techniques for Improving Model Robustness in Adversarial Machine Learning

ignited, 2024

This work investigates advanced methods for improving the resilience of machine learning models against adversarial attacks. Ensuring that these models can withstand deliberately crafted inputs-called adversarial examples-has become critical as machine learning expands into high-stakes fields such as computer vision, cybersecurity, and healthcare. The study examines several types of adversarial attacks, including black-box attacks, where the attacker has no direct knowledge of the model, and white-box attacks, where the attacker has complete access to the model. Popular defense strategies, such as the Fast Gradient Sign Method (FGSM), Iterative FGSM (I-FGSM), and the Carlini and Wagner (C&W) attack, are also discussed. The work emphasizes how adversarial learning contributes to creating more resilient models by addressing both theoretical foundations and practical applications. This thorough investigation highlights the strengths and weaknesses of current approaches, as well as the ongoing need for advancements to protect model integrity against evolving threats.

Resisting Deep Learning Models Against Adversarial Attack Transferability via Feature Randomization

arXiv (Cornell University), 2022

In the past decades, the rise of artificial intelligence has given us the capabilities to solve the most challenging problems in our day-today lives, such as cancer prediction and autonomous navigation. However, these applications might not be reliable if not secured against adversarial attacks. In addition, recent works demonstrated that some adversarial examples are transferable across different models. Therefore, it is crucial to avoid such transferability via robust models that resist adversarial manipulations. In this paper, we propose a feature randomization-based approach that resists eight adversarial attacks targeting deep learning models in the testing phase. Our novel approach consists of changing the training strategy in the target network classifier and selecting random feature samples. We consider the attacker with a Limited-Knowledge and Semi-Knowledge conditions to undertake the most prevalent types of adversarial attacks. We evaluate the robustness of our approach using the well-known UNSW-NB15 datasets that include realistic and synthetic attacks. Afterward, we demonstrate that our strategy outperforms the existing state-of-the-art approach, such as the Most Powerful Attack, which consists of fine-tuning the network model against specific adversarial attacks. Finally, our experimental results show that our methodology can secure the target network and resists adversarial attack transferability by over 60%.

Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks

arXiv (Cornell University), 2020

Adversarial training has been proven to be an effective technique for improving the adversarial robustness of models. However, there seems to be an inherent trade-off between optimizing the model for accuracy and robustness. To this end, we propose Adversarial Concurrent Training (ACT), which employs adversarial training in a collaborative learning framework whereby we train a robust model in conjunction with a natural model in a minimax game. ACT encourages the two models to align their feature space by using the task-specific decision boundaries and explore the input space more broadly. Furthermore, the natural model acts as a regularizer, enforcing priors on features that the robust model should learn. Our analyses on the behavior of the models show that ACT leads to a robust model with lower model complexity, higher information compression in the learned representations, and high posterior entropy solutions indicative of convergence to a flatter minima. We demonstrate the effectiveness of the proposed approach across different datasets and network architectures. On ImageNet, ACT achieves 68.20% standard accuracy and 44.29% robustness accuracy under a 100-iteration untargeted attack, improving upon the standard adversarial training method's 65.70% standard accuracy and 42.36% robustness.

Improving robustness of neural networks against adversarial examples

2020

The main goal of this work is to design and implement the framework that yields robust neural network model against whatever adversarial attack, while result models accuracy is not significantly lower comparing to naturally trained model. Our approach is to minimize maximization the loss function of the target model. Related work and our experiments lead us to the usage of Projected gradient descent method as a reference attack, therefore, we train against data generated by PGD. As a result, using the framework we can reach accuracy more than 90% against sophisticated adversarial attacks on MNIST dataset. The greatest contribution of this work is an implementation of adversarial attacks and defences against them because there misses any public implementation.

CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of Adversarial Robustness of Vision Models

Cornell University - arXiv, 2022

Adversarial examples represent a serious threat for deep neural networks in several application domains and a huge amount of work has been produced to investigate them and mitigate their effects. Nevertheless, no much work has been devoted to the generation of datasets specifically designed to evaluate the adversarial robustness of neural models. This paper presents CARLA-GEAR, a tool for the automatic generation of photo-realistic synthetic datasets that can be used for a systematic evaluation of the adversarial robustness of neural models against physical adversarial patches, as well as for comparing the performance of different adversarial defense/detection methods. The tool is built on the CARLA simulator [1], using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving. The adversarial patches included in the generated datasets are attached to billboards or the back of a truck and are crafted by using state-of-the-art white-box attack strategies to maximize the prediction error of the model under test. Finally, the paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GEAR might be used in future work as a benchmark for adversarial defense in the real world. All the code and datasets used in this paper are available at http://carlagear.retis.santannapisa.it.

Enhancing Adversarial Robustness via Test-time Transformation Ensembling

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

Deep learning models are prone to being fooled by imperceptible perturbations known as adversarial attacks. In this work, we study how equipping models with Test-time Transformation Ensembling (TTE) can work as a reliable defense against such attacks. While transforming the input data, both at train and test times, is known to enhance model performance, its effects on adversarial robustness have not been studied. Here, we present a comprehensive empirical study of the impact of TTE, in the form of widely-used image transforms, on adversarial robustness. We show that TTE consistently improves model robustness against a variety of powerful attacks without any need for retraining , and that this improvement comes at virtually no trade-off with accuracy on clean samples. Finally, we show that the benefits of TTE transfer even to the certified robustness domain, in which TTE provides sizable and consistent improvements.