A Stacking Ensemble for Network Intrusion Detection Using Heterogeneous Datasets (original) (raw)
Related papers
International Journal of Advanced Computer Science and Applications
Network intrusion detection is a key step in securing today's constantly developing networks. Various experiments have been put forward to propose new methods for resisting harmful cyber behaviors. Though, as cyber-attacks turn out to be more complex, the present methodologies fail to adequately solve the problem. Thus, network intrusion detection is now a significant decision-making challenge that requires an effective and intelligent approach. Various machine learning algorithms such as decision trees, neural networks, K nearest neighbor, logistic regression, support vector machine, and Naive Bayes have been utilized to detect anomalies in network traffic. However, such algorithms require adequate datasets to train and evaluate anomaly-based network intrusion detection systems. This paper presents a testbed that could be a model for building real-world datasets, as well as a newly generated dataset, derived from real network traffic, for intrusion detection. To utilize this real dataset, the paper also presents an ensemble intrusion detection model using a meta-classification approach enabled by stacked generalization to address the issue of detection accuracy and false alarm rate in intrusion detection systems.
Ensemble Classifiers for Network Intrusion Detection Using a Novel Network Attack Dataset
Future Internet
Due to the extensive use of computer networks, new risks have arisen, and improving the speed and accuracy of security mechanisms has become a critical need. Although new security tools have been developed, the fast growth of malicious activities continues to be a pressing issue that creates severe threats to network security. Classical security tools such as firewalls are used as a first-line defense against security problems. However, firewalls do not entirely or perfectly eliminate intrusions. Thus, network administrators rely heavily on intrusion detection systems (IDSs) to detect such network intrusion activities. Machine learning (ML) is a practical approach to intrusion detection that, based on data, learns how to differentiate between abnormal and regular traffic. This paper provides a comprehensive analysis of some existing ML classifiers for identifying intrusions in network traffic. It also produces a new reliable dataset called GTCS (Game Theory and Cyber Security) that ...
Modified stacking ensemble approach to detect network intrusion
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2018
Detecting intrusions in a network traffic has remained an issue for researchers over the years. Advances in the area of machine learning provide opportunities to researchers to detect network intrusion without using a signature database. We studied and analyzed the performance of a stacking technique, which is an ensemble method that is used to combine different classification models to create a better classifier, on the KDD'99 dataset. In this study, the stacking method is improved by modifying the model generation and selection techniques and by using different classifications algorithms as a combiner method. Model generation is performed using subsets of the dataset with randomly selected features and not all of these models are used as input for the combiner. Various metrics are used in model selection and only selected models are used as input for the combiner method. In our experiments, the stacking technique provided higher accuracy results all the time compared to pure machine learning techniques. The second important result in our experiments was obtaining the highest detection rate for user-to-root attacks compared to other studies.
I2CACIS 2021, 2021
The exponential rise in internet technologies and allied applications encompass a significantly large number of networked devices have alarmed academia-industries to achieve more effective and robust security solutions. Undeniably, digitization has led to revolution globally; however, the security threats, breaches, and subsequent losses indicate the need for a robust cybersecurity solution. Unlike classical intrusion detection systems (IDS), network IDS (NIDS) has been becoming more challenging due to continuous changes in attack-patterns and anomaly behavior. As solution datadriven machine learning methods have exhibited better by learning over network traffic information and detecting anomalies; however, its generalization over a network with both known and unknown patterns remains questionable. Moreover, most of the classical approaches fail to address the key issues of class-imbalance, level-ofsignificance centric feature selection, normalization and over-fitting problems resulting in different performance by varied machine learning models. In this paper, a novel and robust heterogeneous ensemble machine learning model is developed to detect anomalies in NIDS. The proposed model first applies sub-sampling to alleviate the class-imbalance problem of NIDS datasets. Subsequently, performing normalization using the Min-Max algorithm, it mapped the input data in the range of 0 to 1, thus alleviating overfitting and convergence. The feature reduction is used to reduce the features; it retained the most suitable features without imposing computational overheads, often in meta-heuristic-based approaches. Finally, the proposed NIDS solution designed a Heterogeneous ensemble learning model with J48, k-NN, SVM, Bagging, AdaBoost, and RF algorithms as base-classifier to perform two-class as well as multiclass classification over feature-selected NSL-KDD, KDD99, and UNSW-NB-15 datasets. Performance assessment in terms of truepositive rate, false positive rate and AUC revealed that the proposed NIDS model exhibited better performance than the standalone classifiers and superior to other existing anomaly detection methods.
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
IJCSIS Vol. 18 No. 5 May 2020 Issue, 2020
Classical machine learning techniques have been employed severally in intrusion detection. But due to the rising cases and sophistication of attacks, more advanced machine learning techniques including ensemble-based methods, neural networks and deep learning techniques have been applied. However, there is still need for improved machine learning approach to detect attacks more effectively and efficiently. Stacked generalization approach has been shown to be capable of learning from features and meta-features but has been limited by the deficiencies of base classifiers and lack of optimization in the choice of meta-feature combination. This paper therefore proposes a stacked generalization ensemble approach based on two-tier meta-learner, in which the outputs of classical stacked ensemble are passed to multi-feature-based stacked ensemble, which is optimized. A Grid-search approach is used for the optimization. Nine data features and four meta-features derived from Logistic Regression, Support Vector Machine, Naïve Bayes, and Multilayer Perceptron neural network are used for the machine learning classification task. By applying neural networks as the meta-learner for the classification of NSL-KDD data, improved performances in terms of accuracy, precision, recall and F-measure of 0.97, 0.98, 0.98 and 0.98, respectively are achieved.
Ensemble Models for Intrusion Detection System Classification
International Journal of Smart Sensor and Adhoc Network., 2022
Using data analytics in the problem of Intrusion Detection and Prevention Systems (IDS/IPS) is a continuous research problem due to the evolutionary nature of the problem and the changes in major influencing factors. The main challenges in this area are designing rules that can predict malware in unknown territories and dealing with the complexity of the problem and the conflicting requirements regarding high accuracy of detection and high efficiency. In this scope, we evaluated the usage of state-of-the-art ensemble learning models in improving the performance and efficiency of IDS/IPS. We compared our approaches with other existing approaches using popular open-source datasets available in this area.
Evaluation of Selected Stacked Ensemble Models for the Optimal Multi-class Cyber-Attacks Detection
Intl. Journal on Cyber Situational Awareness, Vol. 5, No. 1, 2020, 2020
The significant rise in the frequency and sophistication of cyber-attacks and their diversity necessitated various researchers to develop strong and effective approaches to address recurring cyber threat challenges. This study evaluated the performance of three selected meta-learning models for optimal multiclass detection of cyber-attacks using the University of New South Wales 2015 Network benchmark (UNSW-NB15) Intrusion Dataset. The results of this study show and confirm the ability of the three base models; Naive Bayes, C4.5 Decision Tree, and K-Nearest Neighbor for solving multi-class problems. It further affirms the knack of the duo of feature selection techniques and stacked ensemble learning to optimize ML models' performances. The stacking of the predictions of the information gain base models with Model Decision Tree meta-algorithm recorded the most improved and optimal cyber-attacks detection accuracy and Mattew's correlation Coefficient than the stacking with the Multiple Model Trees (MMT) and Multi Response Linear regression (MLR) Meta algorithms.
A predictive model for network intrusion detection using stacking approach
International Journal of Electrical and Computer Engineering (IJECE), 2020
Due to the emerging technological advances, cyber-attacks continue to hamper information systems. The changing dimensionality of cyber threat landscape compel security experts to devise novel approaches to address the problem of network intrusion detection. Machine learning algorithms are extensively used to detect intrusions by dint of their remarkable predictive power. This work presents an ensemble approach for network intrusion detection using a concept called Stacking. As per the popular no free lunch theorem of machine learning, employing single classifier for a problem at hand may not be ideal to achieve generalization. Therefore, the proposed work on network intrusion detection emphasizes upon a combinative approach to improve performance. A robust processing paradigm called Graphlab Create, capable of upholding massive data has been used to implement the proposed methodology. Two benchmark datasets like UNSW NB-15 and UGR' 16 datasets are considered to demonstrate the validity of predictions. Empirical investigation has illustrated that the performance of the proposed approach has been reasonably good. The contribution of the proposed approach lies in its finesse to generate fewer misclassifications pertaining to various attack vectors considered in the study.
Identifying and Benchmarking Key Features for Cyber Intrusion Detection: An Ensemble Approach
IEEE Access
In today's interconnected era, Intrusion detection system (IDS) have the potential to be the frontier of defense against cyberattacks and plays an essential role in achieving security of networking resources and infrastructures. The performance of IDS depends highly on data features. Selecting the most informative features eliminating the redundant and irrelevant features from network traffic data for IDS is still an open research issue. The key impetus of this work is to identify and benchmark the potential set of features that can characterize network traffic for intrusion detection. In this correspondence, an ensemble approach is proposed. As a first step, the approach applies four different feature evaluation measures such as correlation, consistency, information and distance to select the more crucial features for intrusion detection. Second, it applies subset combination strategy to merge the output of the four measures and achieve the potential feature set. Along with this, a new framework that adopts the data analytic lifecycle practices is explored to employ the proposed ensemble for building an effective IDS. The effectiveness of the proposed approach is demonstrated by conducting several experiments on four intrusion detection evaluation datasets, namely, KDDCup'99, NSL-KDD, UNSW-NB15 and CICIDS2017. The obtained results prove that the proposed approach contributes more potential features compared to state-of-the-art approaches leading to achieve a promising performance gain in detection rate of 3.2%, false alarm rate of 38% and detection time of 12%. Further, ROC and statistical significance are analyzed for the identified feature subset to strongly conform its acceptability as a future benchmark for building an effective IDS.
ARRAY, 2023
Intrusion detection is a critical aspect of network security to protect computer systems from unauthorized access and attacks. The capacity of traditional intrusion detection systems (IDS) to identify unknown sophisticated threats is constrained by their reliance on signature-based detection. Approaches based on machine learning have shown promising results in identifying unknown malicious attacks. No learning algorithm-based model, however, is able to accurately and consistently detect all different kinds of attacks. Besides that, the existing models are tested for a specific dataset. In this research, a novel ensemble-based machine-learning technique for intrusion detection is presented. Numerous public datasets and multiple ensemble strategies, including Random Forest, Gradient Boosting, Adaboost, Gradient XGBoost, Bagging, and Simple Stacking, will be employed to evaluate the performance of the proposed approach. The most relevant features for the detection of intrusion are selected using correlation analysis, mutual information, and principal component analysis. Our research using different ensemble methods demonstrates that the proposed approach using the Random Forest technique outperforms existing approaches in terms of accuracy and FPR, typically exceeding 99% with better evaluation metrics like Precision, Recall, F1-score, Balanced Accuracy, Cohen's Kappa, etc. This strategy may be a useful tool for strengthening the safety of computer systems and networks against emerging cyber threats.