Classification of Malicious Code Variants using Deep Learning (original) (raw)

MalDeep: A Deep Learning Classification Framework against Malware Variants Based on Texture Visualization

Security and Communication Networks, 2019

The increasing sophistication of malware variants such as encryption, polymorphism, and obfuscation calls for the new detection and classification technology. In this paper, MalDeep, a novel malware classification framework of deep learning based on texture visualization, is proposed against malicious variants. Through code mapping, texture partitioning, and texture extracting, we can study malware classification in a new feature space of image texture representation without decryption and disassembly. Furthermore, we built a malware classifier on convolutional neural network with two convolutional layers, two downsampling layers, and many full connection layers. We adopt the dataset, from Microsoft Malware Classification Challenge including 9 categories of malware families and 10868 variant samples, to train the model. The experiment results show that the established MalDeep has a higher accuracy rate for malware classification. In particular, for some backdoor families, the classi...

Malware Images Classification Using Convolutional Neural Network

Journal of Computer and Communications

Deep learning has been recently achieving a great performance for malware classification task. Several research studies such as that of converting malware into gray-scale images have helped to improve the task of classification in the sense that it is easier to use an image as input to a model that uses Deep Learning's Convolutional Neural Network. In this paper, we propose a Convolutional Neural Network model for malware image classification that is able to reach 98% accuracy.

Using convolutional neural networks for classification of malware represented as images

Journal of Computer Virology and Hacking Techniques, 2018

Malware authors introduced obfuscation techniques to existing malware in order to evade detection and hide its purposes. As a result, the number of malicious programs has grown in both volume and sophistication. Thus, effective categorization of malware based on its characteristics and behavior is required. In this paper, malicious software is visualized as gray scale images since its ability to capture minor changes while retaining the global structure helps to detect variations. Motivated by the visual similarity between malware samples of the same family, we propose a file agnostic deep learning approach for malware categorization to efficiently group malicious software into families based on a set of discriminant patterns extracted from their visualization as images. The suitability of our approach is evaluated against two benchmarks: the MalImg dataset and the BigData Innovators Gathering. Experimental comparison demonstrates its superior performance with respect to state-of-the-art techniques.

Malware Classification with Improved Convolutional Neural Network Model

International Journal of Computer Network and Information Security

Malware is a threat to people in the cyber world. It steals personal information and harms computer systems. Various developers and information security specialists around the globe continuously work on strategies for detecting malware. From the last few years, machine learning has been investigated by many researchers for malware classification. The existing solutions require more computing resources and are not efficient for datasets with large numbers of samples. Using existing feature extractors for extracting features of images consumes more resources. This paper presents a Convolutional Neural Network model with pre-processing and augmentation techniques for the classification of malware gray-scale images. An investigation is conducted on the Malimg dataset, which contains 9339 gray-scale images. The dataset created from binaries of malware belongs to 25 different families. To create a precise approach and considering the success of deep learning techniques for the classificat...

Visual Profiling and Automated Classification of Malware Samples using Deep Learning

Atlantis Highlights in Computer Sciences, 2022

Information security is facing a significant issue due to the proliferation of malware programs. Malware analysis refers to the process of interpreting malicious software to determine its functionality and intent and assist in detection. Conventional methods, which rely on both static and dynamic analyses for malware identification and categorization, often strive to keep up with the everrising evolution of malware. Therefore, our proposal presents a thorough deep learning powered malware analysis system that is divided into three essential modules: data processing, feature extraction, detection, and classification. The data processing module handles converting binary data into grayscale photos specifically, includes an import feature, and skillfully extracts essential virus information. This module makes effective use of these extracted attributes to identify potentially suspicious samples and classify malware cases. The Detection and classification module, which completed the architecture, uses deep learning algorithms to identify malware and classify into respected families, resulting in a strong and proactive approach to cybersecurity. This paper contributes to the realm of enhanced cybersecurity by providing a method that not only enhances accuracy but also has the potential to adapt to emerging malware threats.

Deep Convolution Neural Networks and Image Processing for Malware Detection

Current anti-malware technologies have exposed its glaring vulnerabilities as a result of a signature-based approach as more sophisticated malware has been appearing in recent years, particularly in the android operating system. The state-of-the-art literature offers a wide range of possibilities, but none of them are flawless in terms of providing clear and timely solutions. The current study used a CNN-based deep learning architecture to address this problem. The proposed method collected RGB images from unprocessed malware binaries. We explored complex high-level aspects that effectively identify malware families using an image-based method rather than feature representations in order to detect and identify malware families. The RGB graphics were extracted from the raw APK files because colour images may hold more data in the source code. We developed deep CNNs using produced images that extract higher-level semantics associated with malware. This has allowed us to develop more c...

Malware Detection using Deep Learning

International Journal of Modern Agriculture, 2021

Malicious software or malware continues to pose a major security concern in this digital age as computer users, corporations, and governments witness an exponential growth in malware attacks. Current malware detection solutions adopt Static and Dynamic analysis of malware signatures and behaviour patterns that are time consuming and ineffective in identifying unknown malwares. Recent malwares use polymorphic, metamorphic and other evasive techniques to change the malware behaviour's quickly and to generate large number of malwares. Since new malwares are predominantly variants of existing malwares, machine learning algorithms are being employed recently to conduct an effective malware analysis. This requires extensive feature engineering, feature learning and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Though some recent research studies exist in this direction, the performance of the algorithms is biased with the training data. There is a need to mitigate bias and evaluate these methods independently in order to arrive at new enhanced methods for effective zero-day malware detection. To fill the gap in literature, this work evaluates classical MLAs and deep learning architectures for malware detection, classification and categorization with both public and private datasets. The train and test splits of public and private datasets used in the experimental analysis are disjoint to each other's and collected in different timescales. In addition, we propose a novel image processing technique with optimal parameters for MLAs and deep learning architectures. A comprehensive experimental evaluation of these methods indicate that deep learning architectures outperform classical MLAs. Overall, this work proposes an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments. The visualization and deep learning architectures for static, dynamic and image processing-based hybrid approach in a big data environment is a new enhanced method for effective zero-day malware detection.

Recent Innovations and Comparison of Deep Learning Techniques in Malware Classification : A Review

2021

The internet made an individuals life very easy and more productive, but there are some associated threats linked to the internet and devices. Malware is considered the most severe threat for decades to the digital world and malware variants identification and classification is the most vital and critical research problem. It is an invasive malicious code that accesses devices, information, and services without the permission, knowledge of the user. Researchers, analysts and antivirus companies are incessantly inventing and implementing new strategies to fight back malware and its variants. In the last decade, one of the strategies is extensively used in the field of malware detection and classification is the deep learning methods using malware visualization. Results revealed that using visualization; malware can be identified, classified more promptly, efficiently, and accurately. Deep learning algorithms vary according to applications, architecture, and uses, so it is required to...

An Intelligent Malware Classification Model Based on Image Transformation

International Journal of Advanced Computer Science and Applications

Due to financial incentives, the number of malware infections is steadily rising. Accuracy and effectiveness are essential because malware detection systems serve as the first line of defense against harmful attacks. A zero-day vulnerability is a hole in the target operating system, device driver, application, or other tools employing a computer environment that was previously unknown to anybody other than the hacker. Traditional malware detection systems usually use conventional machine learning algorithms, which call for time-consuming and error-prone feature gathering and extraction. Convolutional neural networks (CNNs) have been demonstrated to outperform conventional learning techniques in a number of applications, including the classification of images. This success prompts us to suggest a CNN-based malware categorization architecture. We evaluated our methodology using a bigger dataset made up of 25 families within a corpus of 9342 malware. Last but not least, comparisons are made between the model's measurement and performance with other cutting-edge deep learning techniques. The overall testing accuracy of 98.31% in the provided results attested to the excellent accuracy and robustness of the suggested procedure at a lower computational cost.

Robust Intelligent Malware Detection Using Deep Learning

IEEE Access, 2019

Security breaches due to attacks by malicious software (malware) continue to escalate posing a major security concern in this digital age. With many computer users, corporations, and governments affected due to an exponential growth in malware attacks, malware detection continues to be a hot research topic. Current malware detection solutions that adopt the static and dynamic analysis of malware signatures and behavior patterns are time consuming and have proven to be ineffective in identifying unknown malwares in real-time. Recent malwares use polymorphic, metamorphic, and other evasive techniques to change the malware behaviors quickly and to generate a large number of new malwares. Such new malwares are predominantly variants of existing malwares, and machine learning algorithms (MLAs) are being employed recently to conduct an effective malware analysis. However, such approaches are time consuming as they require extensive feature engineering, feature learning, and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Recently reported research studies in this direction show the performance of their algorithms with a biased training data, which limits their practical use in real-time situations. There is a compelling need to mitigate bias and evaluate these methods independently in order to arrive at a new enhanced method for effective zero-day malware detection. To fill the gap in the literature, this paper, first, evaluates the classical MLAs and deep learning architectures for malware detection, classification, and categorization using different public and private datasets. Second, we remove all the dataset bias removed in the experimental analysis by having different splits of the public and private datasets to train and test the model in a disjoint way using different timescales. Third, our major contribution is in proposing a novel image processing technique with optimal parameters for MLAs and deep learning architectures to arrive at an effective zero-day malware detection model. A comprehensive comparative study of our model demonstrates that our proposed deep learning architectures outperform classical MLAs. Our novelty in combining visualization and deep learning architectures for static, dynamic, and image processing-based hybrid approach applied in a big data environment is the first of its kind toward achieving robust intelligent zero-day malware detection. Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments.