Data Mining for Intrusion Detection: From Outliers to True Intrusions (original) (raw)

Mining Common Outliers for Intrusion Detection

Studies in Computational Intelligence, 2010

Data mining for intrusion detection can be divided into several sub-topics, among which unsupervised clustering (which has controversial properties). Unsupervised clustering for intrusion detection aims to i) group behaviours together depending on their similarity and ii) detect groups containing only one (or very few) behaviour(s). Such isolated behaviours seem to deviate from the model of normality; therefore, they are considered as malicious. Obviously, not all atypical behaviours are attacks or intrusion attempts. This represents one drawback of intrusion detection methods based on clustering. We take into account the addition of a new feature to isolated behaviours before they are considered malicious. This feature is based on the possible repeated occurrences of the bahaviour on many information systems. Based on this feature, we propose a new outlier mining method which we validate through a set of experiments.

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection

2006 IEEE International Conference on Communications, 2006

Anomaly detection is a critical issue in Network Intrusion Detection Systems (NIDSs). Most anomaly based NIDSs employ supervised algorithms, whose performances highly depend on attack-free training data. However, this kind of training data is difficult to obtain in real world network environment. Moreover, with changing network environment or services, patterns of normal traffic will be changed. This leads to high false positive rate of supervised NIDSs. Unsupervised outlier detection can overcome the drawbacks of supervised anomaly detection. Therefore, we apply one of the efficient data mining algorithms called random forests algorithm in anomaly based NIDSs. Without attack-free training data, random forests algorithm can detect outliers in datasets of network traffic. In this paper, we discuss our framework of anomaly based network intrusion detection. In the framework, patterns of network services are built by random forests algorithm over traffic data. Intrusions are detected by determining outliers related to the built patterns. We present the modification on the outlier detection algorithm of random forests. We also report our experimental results over the KDD'99 dataset. The results show that the proposed approach is comparable to previously reported unsupervised anomaly detection approaches evaluated over the KDD'99 dataset.

An optimization process to identify outliers generated by intrusion detection systems

Security and Communication Networks, 2015

An outlier is an inconsistent observation characterized by its dissimilarity from other observations in a given data set. Many research works have focused on outliers detection in many fields such as network security. In order to protect computer network systems from attacks, usually an intrusion detection system (IDS) is required. However, IDSs generate many outliers which can severely affect their accuracy. In this work, we propose a three-stage method to detect outliers. First, alerts are clustered using the k-means algorithm; then, the generated set of meta-alerts is filtered based on distances between the centroids of the different clusters. Finally, outliers are identified from the filtered meta-alerts using a binary optimization algorithm. Our method is evaluated using the University of California-Irvine machine learning repository and the Defense Advanced Research Projects Agency data sets. Experimental results show that the proposed method outperforms concurrent methods for outlier detection.

COMPARATIVE ANALYSIS OF K-MEANS DATA MINING AND OUTLIER DETECTION APPROACH FOR NETWORK-BASED INTRUSION DETECTION

New kind of intrusions causes deviation in the normal behaviour of traffic flow in computer networks every day. This study focused on enhancing the learning capabilities of IDS to detect the anomalies present in a network traffic flow by comparing the k-means approach of data mining for intrusion detection and the outlier detection approach. The k-means approach uses clustering mechanisms to group the traffic flow data into normal and abnormal clusters. Outlier detection calculates an outlier score (neighbourhood outlier factor (NOF)) for each flow record, whose value decides whether a traffic flow is normal or abnormal. These two methods were then compared in terms of various performance metrics and the amount of computer resources consumed by them. Overall, k-means was more accurate and precise and has better classification rate than outlier detection in intrusion detection using traffic flows. This will help systems administrators in their choice of IDS.

Unsupervised Clustering Approach for Network Anomaly Detection

Communications in Computer and Information Science, 2012

This paper describes the advantages of using the anomaly detection approach over the misuse detection technique in detecting unknown network intrusions or attacks. It also investigates the performance of various clustering algorithms when applied to anomaly detection. Five different clustering algorithms: k-Means, improved k-Means, k-Medoids, EM clustering and distance-based outlier detection algorithms are used. Our experiment shows that misuse detection techniques, which implemented four different classifiers (naïve Bayes, rule induction, decision tree and nearest neighbour) failed to detect network traffic, which contained a large number of unknown intrusions; where the highest accuracy was only 63.97% and the lowest false positive rate was 17.90%. On the other hand, the anomaly detection module showed promising results where the distance-based outlier detection algorithm outperformed other algorithms with an accuracy of 80.15%. The accuracy for EM clustering was 78.06%, for k-Medoids it was 76.71%, for improved k-Means it was 65.40% and for k-Means it was 57.81%. Unfortunately, our anomaly detection module produces high false positive rate (more than 20%) for all four clustering algorithms. Therefore, our future work will be more focus in reducing the false positive rate and improving the accuracy using more advance machine learning techniques.

A Clustering-Based Unsupervised Approach to Anomaly Intrusion Detection

Proceedings of the 2nd International Symposium on Computer, Communication, Control and Automation, 2013

In the present paper a 2-means clustering-based anomaly detection technique is proposed. The presented method parses the set of training data, consisting of normal and anomaly data, and separates the data into two clusters. Each cluster is represented by its centroid -one of the normal observations, and the other -for the anomalies. The paper also provides appropriate methods for clustering, training and detection of attacks. The performance of the presented methodology is evaluated by the following methods: Recall, Precision and F1-measure. Measurements of performance are executed with Dunn index and Davies-Bouldin index.

E. Nikolova, V. Jecheva, A Clustering-Based Unsupervised Approach to Anomaly Intrusion Detection , 2nd International Symposium on Computer, Communication, Control and Automation (3CA 2013), December 1-2, 2013, Singapore, pp.154-160 , 2013

In the present paper a 2-means clustering-based anomaly detection technique is proposed. The presented method parses the set of training data, consisting of normal and anomaly data, and separates the data into two clusters. Each cluster is represented by its centroid -one of the normal observations, and the other -for the anomalies. The paper also provides appropriate methods for clustering, training and detection of attacks. The performance of the presented methodology is evaluated by the following methods: Recall, Precision and F1-measure. Measurements of performance are executed with Dunn index and Davies-Bouldin index.

An Outlier Mining-Based Method for Anomaly Detection

2007 International Workshop on Anti-Counterfeiting, Security and Identification (ASID), 2007

In this paper, a new technology is proposed to solve anomaly detection problems of the high false positive rate or hard to build the model of normal behavior, etc. What our technology based on is the similarity between outliers and intrusions. So we proposed a new outlier mining algorithm based on index tree to detect intrusions. The algorithm improves on the HilOut algorithm to avoid the complex generation of hilbert value. It calculates the upper and lower bound of the weight of each record with r-region and index tree to avoid unnecessary distance calculation. The algorithm is easy to implement, and more suitable to detect intrusions in the audit data. We have performed many experiments on the KDDCup99 dataset to validate the effect of TreeOut and obtain good results.

6 Applying Outlier Detection Techniques in Anomaly- based Network Intrusion Systems – A Theoretical Analysis

2015

With the advent of the Internet, security has become a major concern. An intrusion detection system is used to enhance the security of networks by inspecting all inbound and outbound network activities and by identifying suspicious patterns as possible intrusions. For the past two decades, many researchers are working in Intrusion Detection Systems. In recent years, anomaly detection has gained popularity with its ability to detect novel attacks. Nowadays researchers focus on applying outlier detection techniques for anomaly detection because of its promising results in identifying true attacks and in reducing false alarm rate. In this paper, some of the works which applied outlier analysis in anomaly detection is studied and their results are analyzed.

Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape

ACM/IMS Transactions on Data Science, 2021

Anomaly detection aims at identifying unexpected fluctuations in the expected behavior of a given system. It is acknowledged as a reliable answer to the identification of zero-day attacks to such extent, several ML algorithms that suit for binary classification have been proposed throughout years. However, the experimental comparison of a wide pool of unsupervised algorithms for anomaly-based intrusion detection against a comprehensive set of attacks datasets was not investigated yet. To fill such gap, we exercise 17 unsupervised anomaly detection algorithms on 11 attack datasets. Results allow elaborating on a wide range of arguments, from the behavior of the individual algorithm to the suitability of the datasets to anomaly detection. We conclude that algorithms as Isolation Forests, One-Class Support Vector Machines, and Self-Organizing Maps are more effective than their counterparts for intrusion detection, while clustering algorithms represent a good alternative due to their lo...