Towards a Standard Feature Set of NIDS Datasets (original) (raw)

NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2021

Machine Learning (ML)-based Network Intrusion Detection Systems (NIDSs) have proven to become a reliable intelligence tool to protect networks against cyberattacks. Network data features has a great impact on the performances of ML-based NIDSs. However, evaluating ML models often are not reliable, as each ML-enabled NIDS is trained and validated using different data features that may do not contain security events. Therefore, a common ground feature set from multiple datasets is required to evaluate an ML model's detection accuracy and its ability to generalise across datasets. This paper presents NetFlow features from four benchmark NIDS datasets known as UNSW-NB15, BoT-IoT, ToN-IoT, and CSE-CIC-IDS2018 using their publicly available packet capture files. In a real-world scenario, NetFlow features are relatively easier to extract from network traffic compared to the complex features used in the original datasets, as they are usually extracted from packet headers. The generated Netflow datasets have been labelled for solving binary-and multiclass-based learning challenges. Preliminary results indicate that NetFlow features lead to similar binary-class results and lower multi-class classification results amongst the four datasets compared to their respective original features datasets.

Examining the Suitability of NetFlow Features in Detecting IoT Network Intrusions

Sensors

The past few years have witnessed a substantial increase in cyberattacks on Internet of Things (IoT) devices and their networks. Such attacks pose a significant threat to organizational security and user privacy. Utilizing Machine Learning (ML) in Intrusion Detection Systems (NIDS) has proven advantageous in countering novel zero-day attacks. However, the performance of such systems relies on several factors, one of which is prediction time. Processing speed in anomaly-based NIDS depends on a few elements, including the number of features fed to the ML model. NetFlow, a networking industry-standard protocol, offers many features that can be used to predict malicious attacks accurately. This paper examines NetFlow features and assesses their suitability in classifying network traffic. Our paper presents a model that detects attacks with (98–100%) accuracy using as few as 13 features. This study was conducted using a large dataset of over 16 million records released in 2021.

A Detailed Analysis of Benchmark Datasets for Network Intrusion Detection System

Mossa Ghurab, 2021

The enormous increase in the use of the Internet in daily life has provided an opportunity for the intruder attempt to compromise the security principles of availability, confidentiality, and integrity. As a result, organizations are working to increase the level of security by using attack detection techniques such as Network Intrusion Detection System (NIDS), which monitors and analyzes network flow and attacks detection. There are a lot of researches proposed to develop the NIDS and depend on the dataset for the evaluation. Datasets allow evaluating the ability in detecting intrusion behavior. This paper introduces a detailed analysis of benchmark and recent datasets for NIDS. Specifically, we describe eight well-known datasets that include: KDD99, NSL-KDD, KYOTO 2006+, ISCX2012, UNSW-NB 15, CIDDS-001, CICIDS2017, and CSE-CIC-IDS2018. For each dataset, we provide a detailed analysis of its instances, features, classes, and the nature of the features. The main objective of this paper is to offer overviews of the datasets are available for the NIDS and what each dataset is comprised of. Furthermore, some recommendations were made to use network-based datasets.

Overview and Exploratory Analyses of CICIDS 2017 Intrusion Detection Dataset

Journal of Systems Engineering and Information Technology (JOSEIT)

Intrusion detection systems are used to detect attacks on a network. Machine learning (ML) approaches have been widely used to build such intrusion detection systems (IDSs) because they are more accurate when built from a very large and representative dataset. Recently, one of the benchmark datasets that are used to build ML-based intrusion detection models is the CICIDS2017 dataset. The data set is contained in eight groups and was collected from the Data Set & Repository of the Canadian Institute of Cyber Security. The data set is available in both PCAP and net flow formats. This study used the net flow records in the CIDIDS2017 dataset, as they were found to contain newer attacks, very large, and useful for traffic analysis. Exploratory data analysis (EDA) techniques were used to reveal various characteristics of the dataset. The general objective is to provide more insight into the nature, structure, and issues of the data set so as to identify the best ways to use it to achieve...

Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset

Computational Intelligence and Neuroscience, 2021

Networks are exposed to an increasing number of cyberattacks due to their vulnerabilities. So, cybersecurity strives to make networks as safe as possible, by introducing defense systems to detect any suspicious activities. However, firewalls and classical intrusion detection systems (IDSs) suffer from continuous updating of their defined databases to detect threats. The new directions of the IDSs aim to leverage the machine learning models to design more robust systems with higher detection rates and lower false alarm rates. This research presents a novel network IDS, which plays an important role in network security and faces the current cyberattacks on networks using the UNSW-NB15 dataset benchmark. Our proposed system is a dynamically scalable multiclass machine learning-based network IDS. It consists of several stages based on supervised machine learning. It starts with the Synthetic Minority Oversampling Technique (SMOTE) method to solve the imbalanced classes problem in the da...

Evaluating the Performance of Classification Algorithms on the UNSW-NB15 Dataset for Network Intrusion Detection

Jurnal Ilmiah FIFO, 2024

Network intrusion detection is a critical aspect of cybersecurity, aiming to distinguish between normal and malicious network activities. This study evaluates the performance of various machine learning algorithms on the UNSW-NB15 dataset for binary classification of network traffic into normal and attack categories. We employed several preprocessing steps, including handling missing values, encoding categorical features, and addressing class imbalance using a mix of Synthetic Minority Over-sampling Technique (SMOTE) and undersampling. The models evaluated include k-Nearest Neighbors (k-NN), Naive Bayes, Logistic Regression, Support Vector Machines (SVM), and Neural Networks. Our experimental results show that complex models like Neural Networks and SVMs significantly outperform simpler models. The Neural Network model achieved the highest accuracy of 92%, with a precision of 91%, recall of 93%, and an F1-score of 92%. SVM also performed robustly with an accuracy of 90%. Simpler models, while less effective, still achieved respectable performance, with Logistic Regression and k-NN reaching accuracies of 88% and 85%, respectively. The study highlights the importance of comprehensive preprocessing and the implementation of advanced machine learning techniques for effective network intrusion detection. The results suggest that while complex models offer superior detection capabilities, simpler models can still be valuable in resource-constrained environments. Future research should focus on applying these models to real-world data, exploring more advanced neural network architectures, and implementing cost-sensitive learning techniques to further enhance detection performance and efficiency.

Identifying Relevant Features of CSE-CIC-IDS2018 Dataset for the Development of an Intrusion Detection System

arXiv (Cornell University), 2023

Intrusion detection systems (IDSs) are essential elements of IT systems. Their key component is a classification module that continuously evaluates some features of the network traffic and identifies possible threats. Its efficiency is greatly affected by the right selection of the features to be monitored. Therefore, the identification of a minimal set of features that are necessary to safely distinguish malicious traffic from benign traffic is indispensable in the course of the development of an IDS. This paper presents the preprocessing and feature selection workflow as well as its results in the case of the CSE-CIC-IDS2018 on AWS dataset, focusing on five attack types. To identify the relevant features, six feature selection methods were applied, and the final ranking of the features was elaborated based on their average score. Next, several subsets of the features were formed based on different ranking threshold values, and each subset was tried with five classification algorithms to determine the optimal feature set for each attack type. During the evaluation, four widely used metrics were taken into consideration.

Overview on Intrusion Detection Systems Design Exploiting Machine Learning for Networking Cybersecurity

Applied Sciences

The Intrusion Detection System (IDS) is an effective tool utilized in cybersecurity systems to detect and identify intrusion attacks. With the increasing volume of data generation, the possibility of various forms of intrusion attacks also increases. Feature selection is crucial and often necessary to enhance performance. The structure of the dataset can impact the efficiency of the machine learning model. Furthermore, data imbalance can pose a problem, but sampling approaches can help mitigate it. This research aims to explore machine learning (ML) approaches for IDS, specifically focusing on datasets, machine algorithms, and metrics. Three datasets were utilized in this study: KDD 99, UNSW-NB15, and CSE-CIC-IDS 2018. Various machine learning algorithms were chosen and examined to assess IDS performance. The primary objective was to provide a taxonomy for interconnected intrusion detection systems and supervised machine learning algorithms. The selection of datasets is crucial to e...

Network Based Intrusion Detection Using the UNSW-NB15 Dataset

International Journal of Computing and Digital Systems, 2019

In this work, we apply a two stage anomaly-based network intrusion detection process using the UNSW-NB15 dataset. We use Recursive Feature Elimination and Random Forests among other techniques to select the best dataset features for the purpose of machine learning; then we perform a binary classification in order to identify intrusive traffic from normal one, using a number of data mining techniques, including Logistic Regression, Gradient Boost Machine, and Support Vector Machine. Results of this first stage classification show that the use of Support Vector Machine reports the highest accuracy (82.11%). We then feed the output of Support Vector Machine to a range of multinomial classifiers in order to improve the accuracy of predicting the type of attacks. Specifically, we evaluate the performance of Decision Trees (C5.0), Naïve Bayes and multinomial Support Vector Machine. Applying C5.0 yielded the highest accuracy (74%) and F1 score (86%), and the two-stage hybrid classification improved the accuracy of results by up to 12% (achieving a multi-classification accuracy of 86.04%). Finally, with the support of our results, we present constructive criticism of the UNSW-NB15 dataset.

Comparative Study of Datasets used in Cyber Security Intrusion Detection

International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2020

In recent years, deep learning frameworks are applied in various domains and achieved shows potential performance that includes malware detection software, self-driving cars, identity recognition cameras, adversarial attacks became one crucial security threat to several deep learning applications in today’s world Deep learning techniques became the core part for several cyber security applications like intrusion detection, android malware detection, spam, malware classification, binary analysis and phishing detection. . One of the major research challenges in this field is the insufficiency of a comprehensive data set which reflects contemporary network traffic scenarios, broad range of low footprint intrusions and in depth structured information about the network traffic. For Evaluation of network intrusion detection systems, many benchmark data sets were developed a decade ago. In this paper, we provides a focused literature survey of data sets used for network based intrusion detection and characterize the underlying packet and flow-based network data in detail used for intrusion detection in cyber security. The datasets plays incredibly vital role in intrusion detection; as a result we illustrate cyber datasets and provide a categorization of those datasets.