Evaluating Unbalanced Network Data for Attack Detection (original) (raw)
Related papers
A Multiple-Layer Representation Learning Model for Network-Based Attack Detection
IEEE Access, 2019
Accurate detection of network-based attacks is crucial to prevent security breaches of information systems. The recent application of deep learning approaches for network intrusion detection has shown promising. However, the challenges remain on how to deal with imbalance data and small samples as well as reducing false alarm rate (FAR). To address these issues, this work has proposed a multiple-layer representation learning model for accurate end-to-end network intrusion detection by combining deep convolutional neural networks (CNN) with gcForest. The contributions of this work lie in 1) a new data encoding scheme based on P-Zigzag to encode network traffic data into two-dimensional gray-scale images for representation learning without loss of original information; 2) The combination of gcForest and CNN allows accurate detection on imbalanced data and small scale data with fewer hyperparamters comparing to most existing deep learning models, which increase computational efficiency. The proposed approach is based on a multiple-layer approach consisting of a coarse layer and a fine layer, in which the coarse layer with the improved CNN model (GoogLeNetNP) focuses on identification of N abnormal classes and a normal class. While in the fine layer, an improved model based on gcForest (caXGBoost) further classifies the abnormal classes into N-1 subclasses. This ensures fine-grained detection of various attacks. The proposed framework has been compared with the existing deep learning models using three real datasets (a new dataset NBC, a combination of UNSW-NB15 and CICIDS2017 consisting of 101 classes). The experimental results show that our proposed method outperforms other single deep learning methods (i.e., AlexNet, VGG19, GoogleNet, InceptionV3, ResNet18) in terms of accuracy, detection rate, and FAR, which demonstrates its effectiveness in detecting fine-grained attacks and handling imbalanced datasets with high-precision and low FAR. INDEX TERMS Network intrusion detection, convolutional neural networks, deep random forests, representation learning.
Addressing the Class Imbalance Problem in Network-Based Anomaly Detection
2024 IEEE 14th Symposium on Computer Applications & Industrial Electronics (ISCAIE), 2024
Network anomaly detection systems are vital for identifying malicious activities in computer networks. However, they face a challenge due to class imbalance, where normal traffic outweighs anomalies. This bias leads to models favoring majority classes, neglecting minority anomalies. In this study, we proposed a comprehensive approach to address this issue in network anomaly detection using NSL-KDD and UNSW-NB15 datasets. Our method incorporated techniques like random over-sampling (ROS), random under-sampling (RUS), Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), SMOTE combined with Edited Nearest Neighbors (SMOTEENN), and class reduction. We evaluated our approach on these datasets, showing improved performance metrics for bidirectional long-short memory (Bi-LSTM). Our results highlight the importance of addressing class imbalance for robust network anomaly detection, contributing to cybersecurity in modern networks.
The Journal of Supercomputing
Network intrusion detection systems (NIDS) are the most common tool used to detect malicious attacks on a network. They help prevent the ever-increasing different attacks and provide better security for the network. NIDS are classified into signature-based and anomaly-based detection. The most common type of NIDS is the anomaly-based NIDS which is based on machine learning models and is able to detect attacks with high accuracy. However, in recent years, NIDS has achieved even better results in detecting already known and novel attacks with the adoption of deep learning models. Benchmark datasets in intrusion detection try to simulate real-network traffic by including more normal traffic samples than the attack samples. This causes the training data to be imbalanced and causes difficulties in detecting certain types of attacks for the NIDS. In this paper, a data resampling technique is proposed based on Adaptive Synthetic (ADASYN) and Tomek Links algorithms in combination with diffe...
Detecting Unbalanced Network Traffic Intrusions with Deep Learning
IEEE access, 2024
The growth of cyber threats demands a robust and adaptive intrusion detection system (IDS) capable of effectively recognizing malicious activities from network traffic. However, the existing imbalance of class in network data possess a significant challenge to traditional IDS. To overcome these challenges, this project proposes a novel hybrid Intrusion Detection System using machine learning algorithms, which includes XGBoost, Long Short-Term Memory (LSTM), Mini-VGGNet, and AlexNet, which is used to handle the unbalanced network traffic data. Furthermore, the Random Forest Regressor is used to ascertain the importance of features for enhancing detection accuracy and interpretability. Addressing the inherent class imbalance in network data is crucial for ensuring the IDS's effectiveness. The proposed system employs a combination of oversampling techniques for minority classes and under sampling techniques for majority classes during data preprocessing. This balanced representation of network traffic data helps prevent the IDS from being biased towards the majority class and improves its ability to detect rare or novel intrusions. The utilization of Random Forest Regressor for feature extraction serves a dual purpose. It helps identify the most relevant features within the network traffic data that contribute significantly to detecting intrusions. It enables the system to prioritize and focus on these important features during model training, thereby enhancing detection accuracy while reducing computational complexity. This research contributes to the ongoing efforts to mitigate cyber threats and safeguard critical network infrastructures.
A Deep Learning Ensemble for Network Anomaly and Cyber-Attack Detection
Sensors
Currently, expert systems and applied machine learning algorithms are widely used to automate network intrusion detection. In critical infrastructure applications of communication technologies, the interaction among various industrial control systems and the Internet environment intrinsic to the IoT technology makes them susceptible to cyber-attacks. Given the existence of the enormous network traffic in critical Cyber-Physical Systems (CPSs), traditional methods of machine learning implemented in network anomaly detection are inefficient. Therefore, recently developed machine learning techniques, with the emphasis on deep learning, are finding their successful implementations in the detection and classification of anomalies at both the network and host levels. This paper presents an ensemble method that leverages deep models such as the Deep Neural Network (DNN) and Long Short-Term Memory (LSTM) and a meta-classifier (i.e., logistic regression) following the principle of stacked ge...
International Journal of Advanced Computer Science and Applications
Cyber-security, as an emerging field of research, involves the development and management of techniques and technologies for protection of data, information and devices. Protection of network devices from attacks, threats and vulnerabilities both internally and externally had led to the development of ceaseless research into Network Intrusion Detection System (NIDS). Therefore, an empirical study was conducted on the effectiveness of deep learning and ensemble methods in NIDS, thereby contributing to knowledge by developing a NIDS through the implementation of machine and deep-learning algorithms in various forms on recent network datasets that contains more recent attacks types and attackers' behaviours (UNSW-NB15 dataset). This research involves the implementation of a deep-learning algorithm-Long Short-Term Memory (LSTM)-and two ensemble methods (a homogeneous method-using optimised bagged Random-Forest algorithm, and a heterogeneous method-an Averaged Probability method of Voting ensemble). The heterogeneous ensemble was based on four (4) standard classifiers with different computational characteristics (Naïve Bayes, kNN, RIPPER and Decision Tree). The respective model implementations were applied on the UNSW_NB15 datasets in two forms: as a two-classed attack dataset and as a multi-attack dataset. LSTM achieved a detection accuracy rate of 80% on the two-classed attack dataset and 72% detection accuracy rate on the multi-attack dataset. The homogeneous method had an accuracy rate of 98% and 87.4% on the two-class attack dataset and the multi-attack dataset, respectively. Moreover, the heterogeneous model had 97% and 85.23% detection accuracy rate on the two-class attack dataset and the multi-attack dataset, respectively.
Utilising Deep Learning Techniques for Effective Zero-Day Attack Detection
Machine Learning (ML) and Deep Learning (DL) have been used for building Intrusion Detection Systems (IDS). The increase in both the number and sheer variety of new cyber-attacks poses a tremendous challenge for IDS solutions that rely on a database of historical attack signatures. Therefore, the industrial pull for robust IDSs that are capable of flagging zero-day attacks is growing. Current outlier-based zero-day detection research suffers from high false-negative rates, thus limiting their practical use and performance. This paper proposes an autoencoder implementation for detecting zero-day attacks. The aim is to build an IDS model with high recall while keeping the miss rate (false-negatives) to an acceptable minimum. Two well-known IDS datasets are used for evaluation-CICIDS2017 and NSL-KDD. In order to demonstrate the efficacy of our model, we compare its results against a One-Class Support Vector Machine (SVM). The manuscript highlights the performance of a One-Class SVM when zero-day attacks are distinctive from normal behaviour. The proposed model benefits greatly from autoencoders encoding-decoding capabilities. The results show that autoencoders are well-suited at detecting complex zero-day attacks. The results demonstrate a zero-day detection accuracy of 89-99% for the NSL-KDD dataset and 75-98% for the CICIDS2017 dataset. Finally, the paper outlines the observed trade-off between recall and fallout.
HELAD: A novel network anomaly detection model based on heterogeneous ensemble learning
Computer Networks, 2020
Network traffic anomaly detection is an important technique of ensuring network security. However, there are usually three problems with existing machine learning based anomaly detection algorithms. First, most of the models are built for stale data sets, making them less adaptable in real-world environments; Second, most of the anomaly detection algorithms do not have the ability to learn new models again based on changes in the attack environment; Third, from the perspective of data multi-dimensionality, a single detection algorithm has a peak value and cannot be well adapted to the needs of a complex network attack environment. Thus, we propose a new anomaly detection framework, and this framework is based on the organic integration of multiple deep learning techniques. In the first step, we used the Damped Incremental Statistics algorithm to extract features from network traffic; Second, we train Autoencoder with a small amount of label data; Third, we use Autoencoder to mark the abnormal score of network traffic; Fourth, the data with the abnormal score label is used to train the LSTM; Finally, the weighted method is used to get the final abnormal score. The experimental results show that our HELAD algorithm has better adaptability and accuracy than other state of the art algorithms.
Deep Comparison Analysis : Statistical Methods and Deep Learning for Network Anomaly Detection
International Journal of Computer Science and Information Security (IJCSIS), Vol. 22, No. 5, October , 2024
Abstract—The detection of attacks and anomalies in supervised datasets is critical for maintaining the security and integrity of systems and networks. Traditional methods for attack detection often rely on supervised learning techniques, which may struggle to accurately identify novel or complex threats. In this study, we propose a novel approach to enhancing attack detection models by leveraging deep learning techniques for anomaly detection. Specifically, we introduce a deep learning framework that effectively captures intricate patterns and relationships within supervised datasets to identify anomalous behavior indicative of potential attacks. Through extensive experimentation and evaluation on diverse datasets, we demonstrate the superior performance of our proposed approach compared to traditional methods. Our results highlight the effectiveness of deep learning in enhancing the accuracy and efficiency of attack detection in supervised datasets, paving the way for more robust and adaptive security systems.
Hunter in the Dark: Discover Anomalous Network Activity Using Deep Ensemble Network
2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), 2021
Machine learning (ML)-based intrusion detection systems (IDSs) play a critical role in discovering unknown threats in a large-scale cyberspace. They have been adopted as a mainstream hunting method in many organizations, such as financial institutes, manufacturing companies and government agencies. However, existing designs achieve a high threat detection performance at the cost of a large number of false alarms, leading to alert fatigue. To tackle this issue, in this paper, we propose a neural-network-based defense mechanism named DarkHunter. DarkHunter incorporates both supervised learning and unsupervised learning in the design. It uses a deep ensemble network (trained through supervised learning) to detect anomalous network activities and exploits an unsupervised learning-based scheme to trim off mis-detection results. For each detected threat, DarkHunter can trace to its source and present the threat in its original traffic format. Our evaluations, based on the UNSW-NB15 dataset, show that DarkHunter outperforms the existing ML-based IDSs and is able to achieve a high detection accuracy while keeping a low false positive rate. Index Terms-Network intrusion detection, ensemble learning, neural networks, deep learning, machine learning. • We develop a deep ensemble neural network, Ensem-bleNet, for efficient threat detection. Unlike the traditional ensemble designs, which are mainly based on simple and weak ML models, our ensemble design is constructed with the DNN models so that the high learning potential of DNN can be utilized for good detection performance.