Anomaly-Based Intrusion Detection System (original) (raw)

Anomaly-based Intrusion Detection using Machine Learning Algorithms-A Review Paper

2020

An intrusion is termed as an activity that attempts to compromise the confidentiality or availability of a resource. An intrusion detection system i.e. IDS is the most important field of network security, that monitors the state of software and hardware running in the network. In the past few years, Intrusion detection using machine learning technique has captured the attention of most of the researchers, and every researcher proposes a different algorithm for the distinct features used in the dataset. KDD-Cup99 intrusion detection dataset plays a vital role in the network intrusion detection system and NSL-KDD is an updated or revised version of KDDCup99. The dataset which is mostly used by the researchers working in the field of intrusion detection is KDD-Cup99. This paper presents an overview of various IDS and also the detailed analyses of various machine learning techniques and datasets used for improving IDS.

Design and performance analysis of various feature selection methods for anomaly-based techniques in intrusion detection system

Security and Privacy

Intrusion detection system (IDS) is essential for the network; the intruder can steal sensitive information about networks. The IDS must have the ability to take care of large and real-time data. The predicted rate must be high based on the available attribute. This work deals with a real intrusion detection problem, by its behavior. In this paper, we developed a hybrid model, which can detect intrusion by its action. We used an NSL-KDD data set, the multiclass problem and binary problems are 25% tested. This model can be used to guess the availability of intrusion, able to determine the scope of intrusions based on the transaction of data in the network; training requires optimal features of a network transaction. The accuracy of the model is better for both binary class for the multiclass in NSL-KDD data set. The complication of false data alarm rates is the most significant challenge in the IDS system, and it may be the low false rate or high false rate. Proposed work also addresses this problem. The first step that data will be filtered by Vote algorithm, the Information Gain will get associated with a base learner, to choose the necessary features, which directly affects the accuracy of the model. It uses the following classifier: Ran-domTree, REPTree, RandomForrest AdaBoostM1, Meta Pagging, DesicionStump, J48, LMT, Bagging, and Naive Bayes. On the based on the proposed model, it is observed as low false rate, high accuracy.

HFO-ANID: Hierarchical Feature Optimization for Anomaly based Network Intrusion Detection

Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12), 2012

In the area of feature reduction for an omaly based Intrusion Detection Systems, Computational Intelligence (CI) me th ods are increasingly being used for pro blem solving. This paper concerns using Computational intelligence based learning machines for intrusion detection in hierarchical order of attacking scenarios, which is a problem of general interest to transportation infrastructu re protection since a necessary task thereof is to protect the computers responsible for the infrastructure's operational control, and an effective Intrusion Detection System (IDS) is essential for ensuring network se curity. We argue that the features opted to detect an attack scenario is not same for all kinds of attacks. Hence here in this paper a hi erarchical fe ature optimization for Anomaly based Intrusion Detection System (HAB­ IDS) is proposed. Two classes of learning machines forIDSs are Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). We consider the SVM in three critical respects of IDSs: SVMs train and run an order of magnitude faster; SVMs scale much better; and S VMs give higher classification accuracy. Hence we use SVM for our proposed Hierarchical Feature re duction for intrusion detection. Intrusion detection is the process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusion [1]. Intrusions are defmed as attempts to compro mise the confidentiality, integrity or availab il ity of computer or network. They are caused by attackers access ing a system from the internet, by authorized User of the systems who attempt to gain add itional p riv ileges for wh ich they are not authorized and by authorized user who misuse the privileges given to them [9]. Anomaly detection and misuse detection [11] are two general approaches to computer intrus ion detection system. Unlike misuse detection, which generates an alann when a known attack signature is matched, anomaly detection identifies activities that deviate from the nonn al behavior of the monitored system and thus has the potential to detect novel attacks [14]. In this work our aim is to make anomaly-based intrusion feasible. In our experiment, we used DARPA data set. It has solved some of the inherent problems. It is considered as standard benchmark for intrusion detection evaluation [8 ]. The train ing dataset of DARPA consist of approximately 4,900,000 single connection vectors each of which contains 41 features and is labeled as either nonnal or attack type ,with exactly one specific attack type. Empirical studies indicate that feature reduction technique is capable of reducing the size of dataset. The time and space complexities of most classifiers used are exponential function of their input vector size [15]. Moreover, the demand for the number of samples for the training the classifier grows exponentially with the dimension of the feature space. This limitation is called the 'curse of dimensionality'. In the literature a nu mber of work could be cited wherein several mach ine learn ing paradigms, fuzzy inference systems and expert systems, were used to develop IDS [4][5]. Authors of [8 ] have demonstrated that large number of features is unimportant and may be eliminated, without significantly lowering the perfonn ance of the IDS. Very little scientific efforts are diverted to model efficient IDS Feature optimization. IDS task is often modeled as a classification problem in a mach ine­ learning context. The section II exploring the model proposed, section ill introduces the methodologies and resources used, section IV explores hierarchical feature optimization procedure and section V discussing the proposed HFO-ANIDS that fallowed by results discussion, conclusion and references. II. Relate d Work A new method that could achieve more accuracy than the eXlstmg six classification patterns [G aussian Mixt ure, Radial Basis Function, Binary Tree Classifier, SOM, ART and LAMASTAR],called Hierarchial Gaussian Mixt ure Model[HMM] for IDM was put forward by M.Bahrololum et al[1]. liankun Hu and Xinghuo Yu et al [2] studied de velopment of host-based anomaly intrusion detection, focusing on system call based HMM training. This was later enhanced with the inclusion of data pre-processing for recognizing and eliminating redundant sub-sequences of system calls, resulting in less number of HMM sub models. Experimental results on three public databases showed that training cost can be reduced by 50% without affecting the intrusion detection performance. False alarm rate is higher yet reasonable compared to the batch training method with a 58 % data reduction. R Nakkeeran et al[3] proposed an anomaly detection system comprises of detection modules for detecting anomalies in each layer. The anomaly detection result of the neighbor node(s) is taken by the current node and its result in turn is sent to the ICCCNT12 26 th _28 th July 2012, Coimbatore, India

Application of Machine Learning Approaches in Intrusion Detection System: A Survey

Network security is one of the major concerns of the modern era. With the rapid development and massive usage of internet over the past decade, the vulnerabilities of network security have become an important issue. Intrusion detection system is used to identify unauthorized access and unusual attacks over the secured networks. Over the past years, many studies have been conducted on the intrusion detection system. However, in order to understand the current status of implementation of machine learning techniques for solving the intrusion detection problems this survey paper enlisted the 49 related studies in the time frame between 2009 and 2014 focusing on the architecture of the single, hybrid and ensemble classifier design. This survey paper also includes a statistical comparison of classifier algorithms, datasets being used and some other experimental setups as well as consideration of feature selection step.

Network Intrusion Detection and Classification System: A Supervised Machine Learning Approach

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2024

Intrusion detection systems (IDSs) are crucial for computer security, as they identify and counteract malicious activities within computer networks. Anomaly-based IDSs, specifically, use classification models trained on historical data to detect these harmful activities. This paper proposes an enhanced IDS based on 3-level training and testing of machine learning models, feature selection, resampling, and normalization using Decision Tree, Gaussian Naïve Bayes, K-Nearest Neighbours, Logistic Regression, Random Forest, and Support Vector Machine. In the first stage, the six models are trained and evaluated using the original datasets after pre-processing. In the second stage, the models are built and tested with a resampled version of the dataset using the Synthetic Minority Oversampling Technique (SMOTE). In the third stage, the models are trained and tested with a dataset that has been both resampled and normalized using the standard scaling method. We employ the feature importance technique using the random forest model to select the essential features from NSL-KDD and UNSW-NB15 datasets. The results of our study surpass previous related research, with the decision tree achieving an accuracy, precision, recall, and F1 score of 99.99% on the UNSW-NB15 dataset. Additionally, the decision tree recorded an accuracy of 99.98%, precision of 99.97%, recall of 99.97%, and F1 score of 99.99% on the NSL-KDD dataset.

Network Intrusion Classification Employing Machine Learning: A Survey

2019

In this modern era computer network security is a vital issue. Network security is developed by an efficient Intrusion Detection System (IDS). It is used to identify unauthorized access, malicious attacks and give an alert when monitors any kind of unusual activity. Over the past 30 years, there have been lots of work on intrusion detection system using machine learning algorithms. Basically, realizing the present status of application of machine learning algorithms for solving intrusion classification task, this review work gives a proper guideline. This survey work selected 84 papers based on highest citations number from the years of 2009-2018. This thesis work gives an overview of a different intrusion detection systems, a statistical comparison based on different classifier like single, hybrid and ensemble learning. In addition, we have discussed best machine learning classifiers, best datasets and some feature selections process in this thesis work.

Zero-day Network Intrusion Detection using Machine Learning Approach

International Journal on Recent and Innovation Trends in Computing and Communication

Zero-day network attacks are a growing global cybersecurity concern. Hackers exploit vulnerabilities in network systems, making network traffic analysis crucial in detecting and mitigating unauthorized attacks. However, inadequate and ineffective network traffic analysis can lead to prolonged network compromises. To address this, machine learning-based zero-day network intrusion detection systems (ZDNIDS) rely on monitoring and collecting relevant information from network traffic data. The selection of pertinent features is essential for optimal ZDNIDS performance given the voluminous nature of network traffic data, characterized by attributes. Unfortunately, current machine learning models utilized in this field exhibit inefficiency in detecting zero-day network attacks, resulting in a high false alarm rate and overall performance degradation. To overcome these limitations, this paper introduces a novel approach combining the anomaly-based extended isolation forest algorithm with t...

Design and Development of an Efficient Network Intrusion Detection System Using Machine Learning Techniques

Wireless Communications and Mobile Computing, 2021

Today’s internets are made up of nearly half a million different networks. In any network connection, identifying the attacks by their types is a difficult task as different attacks may have various connections, and their number may vary from a few to hundreds of network connections. To solve this problem, a novel hybrid network IDS called NID-Shield is proposed in the manuscript that classifies the dataset according to different attack types. Furthermore, the attack names found in attack types are classified individually helping considerably in predicting the vulnerability of individual attacks in various networks. The hybrid NID-Shield NIDS applies the efficient feature subset selection technique called CAPPER and distinct machine learning methods. The UNSW-NB15 and NSL-KDD datasets are utilized for the evaluation of metrics. Machine learning algorithms are applied for training the reduced accurate and highly merit feature subsets obtained from CAPPER and then assessed by the cros...

Adaptive Real-time Anomaly-based Intrusion Detection using Data Mining and Machine Learning Techniques

2014

Interconnections between various networks help the emergence of several shortcomings such as generating voluminous data flow, intimidating services to be vulnerable, and increasing the amount of suspicious connections rapidly. In addition, malware solutions and standard security gateway such as the firewall system or the URL blocker have become partially untrustworthy due to the complexity of network traffic and the increase of vulnerabilities and attacks. These problems in network management are unrestrained and threaten the overall system components. Hence, one of the most key aspects in securing computer and communication networks nowadays is the ability to uncover malicious connections (or the so-called zero-day-attack) effectively, accurately and within sufficient detection time. On the other hand, current intrusion detection systems are most likely signature-based applications, which operate in the offline operational mode and, thusly, unable to detect new attacks. Accordingly, the fundamental problem of current Intrusion Detection System (IDS) can be summarized into two points. The first one has to do with the difficulty of processing the massive data flows and the second one is proposing an adaptive intrusion detection model which operates in real-time and efficiently reveals anomaly. In this dissertation, a comprehensive IDS framework has been proposed to overcome these shortcomings. The framework consists of two main parts. The first part, known as OptiFilter, is in charge of aggregating massive data flow, deploying a dynamic queuing concept to process the data flows, constructing sequential connection vectors accordingly, and exporting datasets in an appropriate format. On the other hand, the second part is an adaptive classifier that includes a classifier Model based on the Enhanced Growing Hierarchical Self Organizing Map (EGHSOM), a Normal Network Behavior model (NNB), and update models to keep the proposed framework adaptive in real-time. In OptiFilter, the tcpdump and SNMP traps have been exploited to aggregate the network packets and hosts events continuously. They have also been subject to further analysis, in that they have been converted to connection vectors, which are constructed based on several valuable and important features in the area of intrusion detection. Regarding the adaptive classifier, the intelligent artificial neural network model GHSOM has been intensively investigated and improved upon in order to be effective in classifying the constructed connection vectors into normal, anomaly or unknown during the online operational mode in real-time. In the current study, the original GHSOM approach has been enhanced with several contributions. Namely, it has offered classification-confidence margin threshold to uncover the unknown malicious connections, a stability of the growth topology by an expressive initialization process for weight vectors and reinforcing the final winner units, and a selfadaptive process to update the model constantly. Moreover, the main task of the NNB model is to further investigate the detected unknown connections from the EGHSOM and examine if they belong to the normal behavior model or not. However, during the online classification in real-time, network traffic keeps changing based on the concept drift which in turn leads to generate non-stationary data flows. Thus, in this dissertation, this phenomenon has been controlled by the proposed update models which use benefit of detected anomaly and normal connections to adapt the current EGHSOM and vi NNB models. Hence, the updated EGHSOM model can detect new anomaly even if they appear in different structure and the updated NNB model can accommodate the changes in data flows of the computer network. In the experimental study, the performance evaluation of the proposed framework shows very satisfactory results. The first experiment has evaluated the framework in the offline operational mode. In this regard, OptiFilter has been evaluated by available, synthetic and realistic data flows. In contrast, 10-fold cross-validation has performed on the adaptive classifier to estimate the overall accuracy using the above mentioned data flows. In the second experiment, the framework has been evaluated in a real 1 to 10 GB computer network, i.e. in an online operational mode in real-time. OptiFilter and the adaptive classifier have accurately performed in the sense that the first part constructs continuous connections from the massive data flow and the second part classifies them precisely. The final comparison study between the proposed framework and other well-known IDS approaches shows that the proposed IDS framework outperforms all approaches, especially by the following major points: handling the massive data flow, achieving the best performance metrics (such as the overall accuracy), uncovering unknown connections, and proposing an adaptive technique. vii ZUSAMMENFASSUNG Die zunehmende Vernetzung der Informations-und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen-und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer viii 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDS-Framework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.

Anomaly Based Intrusion Detection through Efficient Machine Learning Model

IJEER , 2023

Machine learning is commonly utilised to construct an intrusion detection system (IDS) that automatically detects and classifies network intrusions and host-level threats. Malicious assaults change and occur in high numbers, needing a scalable solution. Cyber security researchers may use public malware databases for research and related work. No research has examined machine learning algorithm performance on publicly accessible datasets. Data and physical level security and analysis for Data protection have become more important as data volumes grow. IDSs collect and analyse data to identify system or network intrusions for data prevention. The amount, diversity, and speed of network data make data analysis to identify assaults challenging. IDS uses machine learning methods for precise and efficient development of data security mechanism. This work presented intrusion detection model using machine learning, which utilised feature extraction, feature selection and feature modelling for intrusion detection classifier.