MCF: MultiComponent Features for Malware Analysis (original) (raw)

Machine Learning Techniques to Detect Maliciousness of Portable Executable Files

2019 International Conference on Promising Electronic Technologies (ICPET), 2019

In the past few years, malware has become one of the most significant threats to computer security. Malware or malicious is software that attackers use or program to interrupt the operations of a computer, to collect secret or private information, or to access computer systems without being authorized to do. In this paper, we presented a machine learning based approach to classifying a portable executable (PE) file as benign or malware with high accuracy. The proposed approach used the static analysis technique to extract the integrated feature set, which was created by combining a few raw features selected from the three main headers of PE files and a set of derived features. Seven supervised learning algorithms are used in the classification of malware. We compared the performance of each classifier in terms of accuracy, precision, and F -measure. The experimental results indicate that the integrated feature set performs better than the raw feature set on all metrics. Integrated dataset accuracy values are between 91% and 99%, against the raw dataset values which are between 71% and 97% using (70/30) split method. Random Forest has outperformed all classifiers on both datasets (with accuracy of 99.23%).

Detecting Malware in Portable Executable Files using Machine Learning Approach

International Journal of Network Security & Its Applications

There have been many solutions proposed to increase the ability to detection of malware in executable files in general and in Portable Executable files in particular. In this paper, we rely on the PE header structure of Portable Executablefiles to propose another approach in using Machine learning to classify these files, as malware files or benign files. Experimental results show that the proposed approach still uses the Random Forest algorithm for the classification problem but the accuracy and execution time are improved compared to some recent publications (accuracy reaches 99.71%).

Feature Selection and Machine Learning Classification for Malware Detection

Jurnal Teknologi, 2015

Malware is a computer security problem that can morph to evade traditional detection methods based on known signature matching. Since new malware variants contain patterns that are similar to those in observed malware, machine learning techniques can be used to identify new malware. This work presents a comparative study of several feature selection methods with four different machine learning classifiers in the context of static malware detection based on n-grams analysis. The result shows that the use of Principal Component Analysis (PCA) feature selection and Support Vector Machines (SVM) classification gives the best classification accuracy using a minimum number of features.

An Exploratory Analysis of Feature Selection for Malware Detection with Simple Machine Learning Algorithms

Journal of Communications Software and Systems

Computers have become increasingly vulnerable to malicious attacks with an increase in popularity and the proliferation of open system architectures. There are numerous malware detection technologies available to protect the computer operating system from such attacks. This type of malware detector targets programs based on patterns detected in the properties of computer applications. As the amount of analytical data increases, the computer defense system is adversely affected. The performance of the detection mechanism has been hindered due to the presence of numerous irrelevant characteristics. The goal of this study is to provide a feature selection approach that will help malware detection systems be more accurate by detecting pertinent and significant traits. Furthermore, by selecting the most important features, it is possible to maintain an acceptable level of accuracy in the detection of malware while significantly lowering the computational cost. The proposed method displays the most important features (MIFs) obtained from each machine learning method, including data cleaning and feature selection. Furthermore, the method applies six machine learning classification techniques to the selected feature set. Several classifiers were evaluated based on several characteristics for malware detection, including Support Vector Machines (SVM), Logistic Regression (LR), K-nearest neighbor (K-NN), Decision Tree (DT), Naive Bayes (NB), and Random Forest (RF). Our suggested model was tested on two malware datasets to determine its effectiveness. In terms of accuracy, precision, F1 scores, and recall, the experimental findings show that RF and DT classifiers beat other techniques.

Analysis of Feature Importance and Interpretation for Malware Classification

Computers, Materials & Continua, 2020

This study was conducted to enable prompt classification of malware, which was becoming increasingly sophisticated. To do this, we analyzed the important features of malware and the relative importance of selected features according to a learning model to assess how those important features were identified. Initially, the analysis features were extracted using Cuckoo Sandbox, an open-source malware analysis tool, then the features were divided into five categories using the extracted information. The 804 extracted features were reduced by 70% after selecting only the most suitable ones for malware classification using a learning model-based feature selection method called the recursive feature elimination. Next, these important features were analyzed. The level of contribution from each one was assessed by the Random Forest classifier method. The results showed that System call features were mostly allocated. At the end, it was possible to accurately identify the malware type using only 36 to 76 features for each of the four types of malware with the most analysis samples available. These were the Trojan, Adware, Downloader, and Backdoor malware.

Impact of PCA Feature Extraction Method used in Malware Detection for Security Enhancement

International Journal of Engineering and Advanced Technology, 2020

Malware is one of the all told the foremost security threats on the net now a days. Some of the Internet problems like denial of service attacks and spam e-mails have malware threat cause. Computers involved with malware are however networked together for making botnets, and major of threats or attacks are basically launched with the help of these types of malicious and attacker-controlled networks. Downloading files like Executable files like .exe, .bat, .msi etc from sources of untrusted internet probably having an opportunity of getting maliciousness. Further it is seen that these executables are smartly obfuscated with the help of some of the anomalous user for bypassing antivirus stuffs. In this research work , we have proposed an enhanced approach for detecting some of the malicious executables files with the help of analysing the traced Portable Executable (PE) files which are extracted from executable files and use of PCA feature extraction method. The method used in this pa...

Comparative Analysis of Feature Extraction Methods of Malware Detection

2015

Recent years have encountered massive growth in malwares which poses a severe threat to modern computers and internet security. Existing malware detection systems are confronting with unknown malware variants. Recently developed malware detection systems investigated that the diverse forms of malware exhibit similar patterns in their structure with minor variations. Hence, it is required to discriminate the types of features extracted for detecting malwares. So that potential of malware detection system can be leveraged to combat with unfamiliar malwares. We mainly focus on the categorization of features based on malware analysis. This paper highlights general framework of malware detection system and pinpoints strengths and weaknesses of each method. Finally we presented overview of performance of present malware detection systems based on features.

A New Classification Based Model for Malicious PE Files Detection

International Journal of Computer Network and Information Security, 2019

Malware presents a major threat to the security of computer systems, smart devices, and applications. It can also endanger sensitive data by modifying or destroying them. Thus, electronic exchanges through different communicating entities can be compromised. However, currently used signature-based methods cannot provide accurate detection of zero-day attacks, polymorphic and metamorphic programs which have the ability to change their code during propagation. In order to solve this issue, static and dynamic malware analysis is being used along with machine learning algorithms for malware detection and classification. Machine learning methods play an important role in automated malware detection. Several approaches have been applied to classify and to detect malware. The most challenging task is selecting a relevant set of features from a large dataset so that the classification model can be built in less time with higher accuracy. The purpose of this work is firstly to make a general review on the existing classification and detection methods, and secondly to develop an automated system to detect malicious Portable Executable files based on their headers with low performance and more efficiency. Experimental results will be presented for the best classifier selected in this study, namely Random Forest; accuracy and time performance will be discussed.

Enhancing cyber security by predicting malwares using supervised machine learning models

International Journal of Computing and Artificial Intelligence, 2021

Malware poses a severe threat to computer systems and networks. Quick and accurate detection of malware is crucial to mitigating its detrimental impacts. This study aimed to develop a machine learning model to accurately classify whether a Portable Executable (P.E.) file is malware or benign. Supervised classification algorithms like Random Forest, K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), Decision Tree, Multinomial Naïve Bayes, and Logistic Regression were trained on a dataset of 10,868 PE files. Each file had extracted static features like file headers, entropy, string literals, metadata, etc. The algorithms were evaluated using accuracy, precision, recall, and F1 scores. Random Forest performed the best with 99% accuracy, 0.99 precision, 1.00 recall, and a 0.99 F1 score. The features were ranked by importance, with the top ones providing the most discriminatory power. The finalized Random Forest model was saved for operationalization to classify unknown P.E. files automatically. In conclusion, machine learning, especially ensemble tree-based methods, proves highly efficacious for malware detection with the proper feature engineering of file content and characteristics. The model has promising capabilities as an anti-malware system to identify and nullify malware attacks proactively. Further research can focus on generalizability testing across different file types and integration with antivirus solutions.

MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System

Electronics

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique usin...