Comprehensive DDoS Attack Classification Using Machine Learning Algorithms (original) (raw)

Computers, Materials & Continua

The fast development of Internet technologies ignited the growth of techniques for information security that protect data, networks, systems, and applications from various threats. There are many types of threats. The dedicated denial of service attack (DDoS) is one of the most serious and widespread attacks on Internet resources. This attack is intended to paralyze the victim's system and cause the service to fail. This work is devoted to the classification of DDoS attacks in the special network environment called Software-Defined Networking (SDN) using machine learning algorithms. The analyzed dataset included instances of two classes: benign and malicious. As the dataset contained twenty-two features, the feature selection techniques were required for dimensionality reduction. In these experiments, the Information gain, the Chi-square, and the F-test were applied to decrease the number of features to ten. The classes were also not completely balanced, so undersampling, oversampling, and synthetic minority oversampling (SMOTE) techniques were used to balance classes equally. The previous research works observed the classification of DDoS attacks applying various feature selection techniques and one or more machine learning algorithms. Still, they did not pay much attention to classifying the combinations of feature selection and balancing methods with different machine learning algorithms. This work is devoted to the classification of datasets with eight machine learning algorithms: naïve Bayes, logistic regression, support vector machine, k-nearest neighbors, decision tree, random forest, XGBoost, and CatBoost. In the experimental results, the Information gain and F-test feature selection methods achieved better performance with all eight ML algorithms than with the Chi-square technique. Furthermore, the accuracy values of the oversampled and SMOTE datasets were higher than that of the undersampled and imbalanced datasets. Among machine learning algorithms, the accuracy of support vector machine, logistic regression, and naïve Bayes fluctuates between 0.59 and 0.75, while decision tree, random forest, XGBoost, and 578