Malware Predictor using Machine Learning Techniques (original) (raw)

Malware Detection and Classification System Using Random Forest

Malware programs attack computer systems, smart mobile devices, and some applications. Malware is a program that needs to be watched out for because it can be a threat to computer users and internet networks. The malware was created to steal personal information about a computer user or control a user's device over a network. Computers are easily infiltrated by various malware programs that can interfere with and even damage user files. Many users are not aware of the entry of malware programs into a computer, one of which is through a network that contains the malware program. The dataset used in this study was taken from the Kaggle Data Set and VirusShare, with a total of 17845 Data in the form of a comma-separated values (CSV) file, captured based on traffic on the network that contains both malware and non-malware. The process of training and testing on the Data Set is carried out using the Tensorflow Tools by making a Binary Classification or creating two classes, namely the malicious class, and the benign class. The method used is Machine Learning by comparing the Random Forest Algorithm, Support Vector Machine, and Bayesian Network, The system was implemented using python programing language for its backend and HTML/CSS for the frontend of the system. The results obtained from the three algorithms show that Random Forest has the highest level of accuracy with a percentage of 99.95%, a precision of 0.998, and a recall of 0.999, with an average detection speed of 3 to 8 seconds, enabling quick and earlier mitigating action to be done before injury.

Enhancing cyber security by predicting malwares using supervised machine learning models

International Journal of Computing and Artificial Intelligence, 2021

Malware poses a severe threat to computer systems and networks. Quick and accurate detection of malware is crucial to mitigating its detrimental impacts. This study aimed to develop a machine learning model to accurately classify whether a Portable Executable (P.E.) file is malware or benign. Supervised classification algorithms like Random Forest, K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), Decision Tree, Multinomial Naïve Bayes, and Logistic Regression were trained on a dataset of 10,868 PE files. Each file had extracted static features like file headers, entropy, string literals, metadata, etc. The algorithms were evaluated using accuracy, precision, recall, and F1 scores. Random Forest performed the best with 99% accuracy, 0.99 precision, 1.00 recall, and a 0.99 F1 score. The features were ranked by importance, with the top ones providing the most discriminatory power. The finalized Random Forest model was saved for operationalization to classify unknown P.E. files automatically. In conclusion, machine learning, especially ensemble tree-based methods, proves highly efficacious for malware detection with the proper feature engineering of file content and characteristics. The model has promising capabilities as an anti-malware system to identify and nullify malware attacks proactively. Further research can focus on generalizability testing across different file types and integration with antivirus solutions.

A Mult-Task System for Detecting and Classifying Malware Signatures Using Random Forest Classifier

Advances in Multidisciplinary and scientific Research Journal Publication, 2021

The rapid increase in the use of information technology has made cyber-attacks a major concern in the use of internet by users globally. These attacks are carried out in different forms, some are carried out as phishing, man in the middle, malicious applications and so on. In this study we will focus on malware attack. Malicious applications have been a major challenge in the use of applications on windows operating system. These malicious attacks are being carried out in different forms. Some of these attacks are trojan, ransom, keylogger etc. The need to detect and classifier these malicious attacks in windows operating system is an important task. So therefore, this paper presents a smart system for detecting and classifying eight categories of malware attack on windows operating system using random forest classifier. The system starts by collecting signatures of malware attack on windows from Virus Share, Virus Sign and Github respiratory. The collected malware signatures went t...

Enhancing Malware Detection through Machine Learning: A Comparative Analysis of Random Forest and Naive Bayes Classification Systems

PEARL INC, 2024

Malware, a type of malicious software encompassing viruses, worms, Trojans, backdoors, and spyware, poses a grave threat to the confidentiality, integrity, and functionality of computer systems, given their integral role in everyday life. To combat the escalating sophistication of malware attacks, deep-learning-based Malware Detection Systems (MDSs) have emerged as indispensable components of both economic and national security. Utilizing a dataset sourced from a repository, our research focuses on classifying observations into benign and malicious software for Android devices, employing machine learning algorithms such as Random Forest and Naïve Bayes. The dataset comprises 100,000 observations with 35 features, and our evaluation metrics encompass accuracy, precision, recall, and F1-score. This study underscores the significance of MDSs in safeguarding against evolving cyber threats, utilizing cutting-edge machine learning techniques to bolster defense mechanisms.

Malware Detection Using Machine Learning

International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2023

Malware detection is a critical cybersecurity task, and this research explores the application of machine learning techniques to enhance detection accuracy. Leveraging Logistic Regression, Decision Tree, and Random Forest Classifier algorithms, our approach effectively classifies files as benign or malicious based on extracted features. Feature selection is performed to identify the most informative attributes. The models are evaluated on performance metrics, including accuracy and ROC curves, demonstrating their effectiveness. By utilizing ensemble methods and interpretability of Decision Trees, we aim to provide robust, explainable, and high-accuracy malware detection solutions. In a comparative analysis, we assess the strengths and weaknesses of each algorithm, enabling practitioners to make informed choices. Furthermore, we address the challenge of handling imbalanced datasets, which is common in real-world scenarios, ensuring that our approach maintains a high detection rate for both benign and malicious samples.

Machine Learning Techniques for Malware Detection

International Journal of Scientific Research in Science, Engineering and Technology, 2021

The introduction of Transport Layer Security has been one of the most important contributors to the privacy and security of internet communications during the last decade. Malware authors have followed suit, using TLS to hide potentially dangerous network connections. Because of the growing use of encryption and other evasion measures, traditional content-based network traffic categorization is becoming more challenging. In this paper, we provide a malware classification technique that uses packet information and machine learning algorithms to detect malware. We employ the use of classification algorithms such as support vector machine and random forest. We start by eliminating characteristics that are highly correlated. We utilized the Random Forest method to choose only the 10 best characteristics from all the remaining features after eliminating the unnecessary ones. Following the feature selection phase, we employ several classification algorithms and evaluate their performance. Random forest algorithm performed exceptionally well in our experiments resulting in an accuracy score of over 0.99.

A Novel Malware Detection Approach Using Performance Importance Weighted Random Forest (Peri-WRF) Learning Model

Indian Journal of Computer Science and Engineering

Malware detection has gained huge attention in recent times. This is mainly because of the increase in new malware variants which pose a significant threat to information security. The conventional malware detection systems are not capable of detecting new generation malwares due to the constant changes in the network behavior. An efficient malware detection approach must be able to handle the dynamic changes in the malware behavior with a very minimum processing time to identify malicious attacks at the initial stage. This paper presents a novel Performance Importance Weighted Random Forest (PERI-WRF) for detecting different types of malwares in network systems. The proposed PERI-WRF incorporates a novel data reduction technique which is capable of reducing the size of training data to maximize the classification accuracy. A clustering algorithm consisting of GWO and K-means++ algorithm is implemented to group the malicious data samples collected from input. To validate the effectiveness of the detection framework, the system was tested using various evaluation metrics. Results show that the proposed malware detection model with novel data reduction techniques achieves superior classification accuracy, and the proposed approach is appropriate for detecting real-time malwares with superior accuracy and low MAE score.

Malware Detection Software Powered by Machine Learning

2021

Malicious software is overflowing in a world of countless computer users, who are continuously faced with these threats from various sources like the internet, local networks, and portable drives. The continued evolution of Malware is potentially a major threat in this cyber world. The current paper aims to create software powered by machine learning to detect whether a given software is malicious or not, before the installation of the software in the system. This task will be accomplished by utilizing machine learning algorithm called Random forest classifier, which is a type of supervised learning algorithm and will try to detect malware without relying on any Signature-based traditional techniques which are processor-intensive and efficient due to large amount of malware being made on day to day basics rather rely on Static analysis using PE file format with the help of feature extraction and build an effective, processor efficient malware detection software with high accuracy and low false-positive rate.

Machine learning in malware detection: Analytical perspective

INTERNATIONAL SCIENTIFIC AND PRACTICAL CONFERENCE “TECHNOLOGY IN AGRICULTURE, ENERGY AND ECOLOGY” (TAEE2022)

Computer technology has become a necessity in human's life in various areas like online education, financial sector, entertainment, communication, etc. But computer security is vulnerable due to malware, which are the codes to damage the computer system. Some primary tools can detect the malware, known as malware detectors, whose quality depends on the techniques used in detectors. Malware analysis is the method of investigating the intention and practicality of the samples of malware like a worm, virus, trojan horse, etc. Static, dynamic, and hybrid approaches are used for malware analysis by various researchers. The machine learning techniques are most popular that employ these approaches. The machine learning approaches are also categorized as supervised, unsupervised, and reinforcement. Researchers employ one, two, or a blend of these approaches malware detection This research paper includes a study of these malware analysis techniques, and we analyze several machine learning algorithms and demonstrate the results obtained from the different machine learning algorithms. We compare outcomes of algorithms such as J48, Logistic Regression, and Random Forest. Moreover, we also employ a voting approach and show that Random Forest works better than other algorithms.

Malware Classification and Machine Learning: A Survey

International Journal of Latest Research in Engineering and Technology, 2016

Malicious software, referred to as malware, is one of the major threats on the Internet today. Due to vast use of Internet it found that many system are infected from malware which in form of computer virus, Trojan Horse, Worms, Rootkit, backdoor, evasion etc. In this paper we reviewed different approaches for malware analysis. Malware Analysis and classification with machine learning techniques is also discussed. This paper aimed to provide introduction about malware and its related issues.