SafeML: Safety Monitoring of Machine Learning Classifiers Through Statistical Difference Measures (original) (raw)

SAFEML: SAFETY MONITORING OF MACHINE LEARNING CLASSIFIERS THROUGH STATISTICAL DIFFERENCE MEASURE A PREPRINT

Springer, 2020

Ensuring safety and explainability of machine learning (ML) is a topic of increasing relevance as data-driven applications venture into safety-critical application domains, traditionally committed to high safety standards that are not satisfied with an exclusive testing approach of otherwise inaccessible black-box systems. Especially the interaction between safety and security is a central challenge, as security violations can lead to compromised safety. The contribution of this paper to addressing both safety and security within a single concept of protection applicable during the operation of ML systems is active monitoring of the behavior and the operational context of the data-driven system based on distance measures of the Empirical Cumulative Distribution Function (ECDF). We investigate abstract datasets (XOR, Spiral, Circle) and current security-specific datasets for intrusion detection (CICIDS2017) of simulated network traffic, using statistical distance measures including the Kolmogorov-Smirnov, Kuiper, Anderson-Darling, Wasserstein and mixed Wasserstein-Anderson-Darling measures. Our preliminary findings indicate that there is a meaningful correlation between ML decisions and the ECDF-based distances measures of the input features. Thus, they can provide a confidence level that can be used for a) analyzing the applicability of the ML system in a given field (safety/security) and b) analyzing if the field data was maliciously manipulated 1 .

Security Theater: On the Vulnerability of Classifiers to Exploratory Attacks

2018

The increasing scale and sophistication of cyberattacks has led to the adoption of machine learning based classification techniques, at the core of cybersecurity systems. These techniques promise scale and accuracy, which traditional rule or signature based methods cannot. However, classifiers operating in adversarial domains are vulnerable to evasion attacks by an adversary, who is capable of learning the behavior of the system by employing intelligently crafted probes. Classification accuracy in such domains provides a false sense of security, as detection can easily be evaded by carefully perturbing the input samples. In this paper, a generic data driven framework is presented, to analyze the vulnerability of classification systems to black box probing based attacks. The framework uses an exploration exploitation based strategy, to understand an adversary's point of view of the attack defense cycle. The adversary assumes a black box model of the defender's classifier and ...

Performance Assessment of Supervised Classifiers for Designing Intrusion Detection Systems: A Comprehensive Review and Recommendations for Future Research

Mathematics, 2021

Supervised learning and pattern recognition is a crucial area of research in information retrieval, knowledge engineering, image processing, medical imaging, and intrusion detection. Numerous algorithms have been designed to address such complex application domains. Despite an enormous array of supervised classifiers, researchers are yet to recognize a robust classification mechanism that accurately and quickly classifies the target dataset, especially in the field of intrusion detection systems (IDSs). Most of the existing literature considers the accuracy and false-positive rate for assessing the performance of classification algorithms. The absence of other performance measures, such as model build time, misclassification rate, and precision, should be considered the main limitation for classifier performance evaluation. This paper’s main contribution is to analyze the current literature status in the field of network intrusion detection, highlighting the number of classifiers us...

When Good Machine Learning Leads to Bad Security

Ubiquity, 2018

While machine learning has proven to be promising in several application domains, our understanding of its behavior and limitations is still in its nascent stages. One such domain is that of cybersecurity, where machine learning models are replacing traditional rule based systems, owing to their ability to generalize and deal with large scale attacks which are not seen before. However, the naive transfer of machine learning principles to the domain of security needs to be taken with caution. Machine learning was not designed with security in mind and as such is prone to adversarial manipulation and reverse engineering. While most data based learning models rely on a static assumption of the world, the security landscape is one that is especially dynamic, with an ongoing never ending arms race between the system designer and the attackers. Any solution designed for such a domain needs to take into account an active adversary and needs to evolve over time, in the face of emerging thre...