A Survey of Safety and Trustworthiness of Deep Neural Networks (original) (raw)

A Survey of Safety and Trustworthiness of Deep Neural Networks: Verification, Testing, Adversarial Attack and Defence, and Interpretability

arXiv (Cornell University), 2018

In the past few years, significant progress has been made on deep neural networks (DNNs) in achieving human-level performance on several long-standing tasks. With the broader deployment of DNNs on various applications, the concerns over their safety and trustworthiness have been raised in public, especially after the widely reported fatal incidents involving self-driving cars. Research to address these concerns is particularly active, with a significant number of papers released in the past few years. This survey paper conducts a review of the current research effort into making DNNs safe and trustworthy, by focusing on four aspects: verification, testing, adversarial attack and defence, and interpretability. In total, we survey 202 papers, most of which were published after 2017. Contents * This work is supported by the UK EPSRC projects on Offshore Robotics for Certification of Assets (ORCA) [EP/R026173/1] and End-to-End Conceptual Guarding of Neural Architectures [EP/T026995/1], and ORCA Partnership Resource Fund (PRF) on Towards the Accountable and Explainable Learning-enabled Autonomous Robotic Systems, as well as the UK Dstl projects on Test Coverage Metrics for Artificial Intelligence.

Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective

IEEE Transactions on Dependable and Secure Computing

Neural networks have been widely applied in security applications such as spam and phishing detection, intrusion prevention, and malware detection. This black-box method, however, often has uncertainty and poor explainability in applications. Furthermore, neural networks themselves are often vulnerable to adversarial attacks. For those reasons, there is a high demand for trustworthy and rigorous methods to verify the robustness of neural network models. Adversarial robustness, which concerns the reliability of a neural network when dealing with maliciously manipulated inputs, is one of the hottest topics in security and machine learning. In this work, we survey existing literature in adversarial robustness verification for neural networks and collect 39 diversified research works across machine learning, security, and software engineering domains. We systematically analyze their approaches, including how robustness is formulated, what verification techniques are used, and the strengths and limitations of each technique. We provide a taxonomy from a formal verification perspective for a comprehensive understanding of this topic. We classify the existing techniques based on property specification, problem reduction, and reasoning strategies. We also demonstrate representative techniques that have been applied in existing studies with a sample model. Finally, we discuss open questions for future research.

Two to Trust: AutoML for Safe Modelling and Interpretable Deep Learning for Robustness

Trustworthy AI - Integrating Learning, Optimization and Reasoning, 2021

With great power comes great responsibility. The success of machine learning, especially deep learning, in research and practice has attracted a great deal of interest, which in turn necessitates increased trust. Sources of mistrust include matters of model genesis ("Is this really the appropriate model?") and interpretability ("Why did the model come to this conclusion?", "Is the model safe from being easily fooled by adversaries?"). In this paper, two partners for the trustworthiness tango are presented: recent advances and ideas, as well as practical applications in industry in (a) Automated machine learning (AutoML), a powerful tool to optimize deep neural network architectures and finetune hyperparameters, which promises to build models in a safer and more comprehensive way; (b) Interpretability of neural network outputs, which addresses the vital question regarding the reasoning behind model predictions and provides insights to improve robustness against adversarial attacks.

Increasing the Confidence of Deep Neural Networks by Coverage Analysis

ArXiv, 2021

The great performance of machine learning algorithms and deep neural networks in several perception and control tasks is pushing the industry to adopt such technologies in safety-critical applications, as autonomous robots and self-driving vehicles. At present, however, several issues need to be solved to make deep learning methods more trustworthy, predictable, safe, and secure against adversarial attacks. Although several methods have been proposed to improve the trustworthiness of deep neural networks, most of them are tailored for specific classes of adversarial examples, hence failing to detect other corner cases or unsafe inputs that heavily deviate from the training samples. This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model robustness against different unsafe inputs. In particular, four coverage analysis methods are proposed and tested in the architecture for evaluating multiple detection logic. Experimental results sho...

Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety

Deep Neural Networks and Data for Automated Driving

Deployment of modern data-driven machine learning methods, most often realized by deep neural networks (DNNs), in safety-critical applications such as health care, industrial plant control, or autonomous driving is highly challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability and implausible predictions to directed attacks by means of malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from so-called safety concerns, properties that preclude their deployment as no argument or experimental setup can help to assess the remaining risk. In recent years, an abundance of state-of-the-art techniques aiming to address these safety concerns has emerged. This chapter provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigat...

Safely Entering the Deep: A Review of Verification and Validation for Machine Learning and a Challenge Elicitation in the Automotive Industry

Deep Neural Networks (DNN) will emerge as a cornerstone in automotive software engineering. However , developing systems with DNNs introduces novel challenges for safety assessments. This paper reviews the state-of-the-art in verification and validation of safety-critical systems that rely on machine learning. Furthermore, we report from a workshop series on DNNs for perception with automotive experts in Sweden, confirming that ISO 26262 largely contravenes the nature of DNNs. We recommend aerospace-to-automotive knowledge transfer and systems-based safety approaches, e.g., safety cage architectures and simulated system test cases.

Evaluating Adversarial Attacks on Driving Safety in Vision-Based Autonomous Vehicles

IEEE Internet of Things Journal, 2021

In recent years, many deep learning models have been adopted in autonomous driving. At the same time, these models introduce new vulnerabilities that may compromise the safety of autonomous vehicles. Specifically, recent studies have demonstrated that adversarial attacks can cause a significant decline in detection precision of deep learning-based 3D object detection models. Although driving safety is the ultimate concern for autonomous driving, there is no comprehensive study on the linkage between the performance of deep learning models and the driving safety of autonomous vehicles under adversarial attacks. In this paper, we investigate the impact of two primary types of adversarial attacks, perturbation attacks and patch attacks, on the driving safety of vision-based autonomous vehicles rather than the detection precision of deep learning models. In particular, we consider two state-of-the-art models in visionbased 3D object detection, Stereo R-CNN and DSGN. To evaluate driving safety, we propose an end-to-end evaluation framework with a set of driving safety performance metrics. By analyzing the results of our extensive evaluation experiments, we find that (1) the attack's impact on the driving safety of autonomous vehicles and the attack's impact on the precision of 3D object detectors are decoupled, and (2) the DSGN model demonstrates stronger robustness to adversarial attacks than the Stereo R-CNN model. In addition, we further investigate the causes behind the two findings with an ablation study. The findings of this paper provide a new perspective to evaluate adversarial attacks and guide the selection of deep learning models in autonomous driving.

Checking Robustness of Representations Learned by Deep Neural Networks

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track, 2021

Recent works have shown the vulnerability of deep neural networks to adversarial or out-of-distribution examples. This weakness may come from the fact that training deep models often leads to extracting spurious correlations between image classes and some characteristics of images used for training. As demonstrated, popular, ready-to-use models like the ResNet or the EfficientNet may rely on the non-obvious and counterintuitive features. Detection of these weaknesses is often difficult as classification accuracy is excellent and does not indicate that the model is non-robust. To address this problem, we propose a new method and a measure called robustness score. The method allows indicating which classes are recognized by the deep model using non-robust representations, ie. representations based on spurious correlations. Since the root of this problem lies in the quality of the training data, our method allows us to analyze the training dataset in terms of the existence of these non-obvious spurious correlations. This knowledge can be used to attack the model by finding adversarial images. Consequently, our method can expose threats to the model's reliability, which should be addressed to increase the certainty of classification decisions. The method was verified using the ImageNet and Pascal VOC datasets, revealing many flaws that affect the final quality of deep models trained on these datasets.

Input verification for deep neural networks

2018

As deep learning systems are more frequently being applied to safety-critical domains, the protection against irrelevant and malicious data is of greater significance than ever before. By applying an algorithm, or supervisor, which removes this irrelevant data from the deep learning system, the credibility of the system is increased. The purpose of this project was to find and evaluate existing algorithms, and to develop a new method which can protect a neural network from irrelevant inputs. The thesis is centered on a hypothesis of capturing the learned domain of a CNN by using adversarial examples. A thorough literature review resulted in 18 analyzed methods of which five methods were implemented. The methods were evaluated on three scenarios: MNIST versus Omniglot; CIFAR-10 versus CIFAR-100; Retinal OCT-images with a held out class. Results of the study point out the difficulties in using adversarial examples to represent the infinite set of novelties. Rather than using adversari...