Malware Detection by Control-Flow Graph Level Representation Learning With Graph Isomorphism Network (original) (raw)

Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2019

Malware have been one of the biggest cyber threats in the digital world for a long time. Existing machine learningbased malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new machine learning techniques to classify malware programs represented as their control flow graphs (CFGs). To overcome the drawbacks of existing malware analysis methods using inefficient and nonadaptive graph matching techniques, in this work, we build a new system that uses deep graph convolutional neural network to embed structural information inherent in CFGs for effective yet efficient malware classification. We use two large independent datasets that contain more than 20K malware samples to evaluate our proposed system and the experimental results show that it can classify CFG-represented malware programs with performance comparable to those of the state-of-the-art methods applied on handcrafted malware features.

A Comparison of Graph Neural Networks for Malware Classification

arXiv (Cornell University), 2023

Managing the threat posed by malware requires accurate detection and classification techniques. Traditional detection strategies, such as signature scanning, rely on manual analysis of malware to extract relevant features, which is labor intensive and requires expert knowledge. Function call graphs consist of a set of program functions and their inter-procedural calls, providing a rich source of information that can be leveraged to classify malware without the labor intensive feature extraction step of traditional techniques. In this research, we treat malware classification as a graph classification problem. Based on Local Degree Profile features, we train a wide range of Graph Neural Network (GNN) architectures to generate embeddings which we then classify. We find that our best GNN models outperform previous comparable research involving the wellknown MalNet-Tiny Android malware dataset. In addition, our GNN models do not suffer from the overfitting issues that commonly afflict non-GNN techniques, although GNN models require longer training times.

A Survey on Malware Detection with Graph Representation Learning

arXiv (Cornell University), 2023

Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. More recently, the application of such techniques on graph-structured data has achieved state-of-the-art performance in various domains and demonstrates promising results in learning more robust representations from malware. Yet, no literature review focusing on graph-based deep learning for malware detection exists. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures, leading to an efficient detection by downstream classifiers. This paper also reviews adversarial attacks that are utilized to fool graph-based detection methods. Challenges and future research directions are discussed at the end of the paper. CCS Concepts: • General and reference → Surveys and overviews; • Security and privacy → Malware and its mitigation; • Computing methodologies → Learning latent representations; Neural networks; Spectral methods.

Detecting Malware Based on Dynamic Analysis Techniques Using Deep Graph Learning

Future Data and Security Engineering, 2020

Detecting malware using dynamic analysis techniques is an efficient method. Those familiar techniques such as signature-based detection perform poorly when attempting to identify zero-day malware, and it is also a challenging and time-consuming task to manually engineer malicious behaviors. Several studies have tried to detect unknown behaviors automatically. One of effective approaches introduced in recent years is to use graphs to represent the behavior of an executable, and learn from these graphs. However, current graph representations have ignored much important information such as parameters, variables changes… In this paper, we present a new method for malware detection by applying a graph attention network on multi-edge directional heterogeneous graphs constructed from Windows API calls collected after a file being executed in cuckoo sandbox… The experiments show that our model achieves better performance than other baseline models at both TPR and FAR scores.

Behavioral Malware Detection Using Deep Graph Convolutional Neural Networks

Malware behavioral graphs provide a rich source of information that can be leveraged for detection and classification tasks. In this paper, we propose a novel behavioral malware detection method based on Deep Graph Convolutional Neural Networks (DGCNNs) to learn directly from API call sequences and their associated behavioral graphs. In order to train and evaluate the models, we created a new public domain dataset of more than 40,000 API call sequences resulting from the execution of malware and goodware instances in a sandboxed environment. Experimental results show that our models achieve similar Area Under the ROC Curve (AUC-ROC) and F1-Score to Long-Short Term Memory (LSTM) networks, widely used as the base architecture for behavioral malware detection methods, thus indicating that the models can effectively learn to distinguish between malicious and benign temporal patterns through convolution operations on graphs. To the best of our knowledge, this is the first paper that inve...

Fast malware classification by automated behavioral graph matching

2010

Malicious software (malware) is a serious problem in the Internet. Malware classification is useful for detection and analysis of new threats for which signatures are not available, or possible (due to polymorphism). This paper proposes a new malware classification method based on maximal common subgraph detection. A behavior graph is obtained by capturing system calls during the execution (in a sandboxed environment) of the suspicious software. The method has been implemented and tested on a set of 300 malware instances in 6 families. Results demonstrate the method effectively groups the malware instances, compared with previous methods of classification, is fast, and has a low false positive rate when presented with benign software.

Frequent sub-graph mining for intelligent malware detection

Security and Communication Networks, 2014

Malware is a serious threat that has caused catastrophic disasters in recent decades. To deal with this issue, various approaches have been proposed. One effective and widely used method is signature-based detection. However, there is a substantial problem in detecting new instances; therefore, this method is solely useful for second malware attacks. In addition, owing to the rapid proliferation of malware and the significant human effort requirement to extract signatures, this approach is an inadequate solution; thus, an intelligent malware detection system is required. One of the major phases of such a system is feature extraction, used to construct a learning model. This paper introduces an approach to generate a group of semantic signatures, represented by a set of learning models, in which various features indicate the different programming styles of the execution files. A set of these signatures is obtained by mining frequent sub-graphs, common code sub-structures employed for malware writing, in a group of control flow graphs. The experimental results depict an improved F-measure rate in comparison with the classic graph-based approach.

A Survey on Mining Program-Graph Features for Malware Analysis

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2015

Malware, which is a malevolent software, mostly programmed by attackers for either disrupting the normal computer operation or gaining access to private computer systems. A malware detector determines the malicious intent of a program and thereafter, stops executing the program if the program is malicious. While a substantial number of various malware detection techniques based on static and dynamic analysis has been studied for decades, malware detection based on mining program graph features has attracted recent attention. It is commonly believed that graph based representation of a program is a natural way to understand its semantics and thereby, unveil its execution intent. This paper presents a state of the art survey on mining program-graph features for malware detection. We have also outlined the challenges of malware detection based on mining program graph features for its successful deployment, and opportunities that can be explored in the future.

Android Malware Detection Using One-Class Graph Neural Networks

The ISC International Journal of Information Security, 2022

With the widespread use of Android smartphones, the Android platform has become an attractive target for cybersecurity attackers and malware authors. Meanwhile, the growing emergence of zero-day malware has long been a major concern for cybersecurity researchers. This is because malware that has not been seen before often exhibits new or unknown behaviors, and there is no documented defense against it. In recent years, deep learning has become the dominant machine learning technique for malware detection and could achieve outstanding achievements. Currently, most deep malware detection techniques are supervised in nature and require training on large datasets of benign and malicious samples. However, supervised techniques usually do not perform well against zero-day malware. Semi-supervised and unsupervised deep malware detection techniques have more potential to detect previously unseen malware. In this paper, we present MalGAE, a novel end-to-end deep malware detection technique that leverages one-class graph neural networks to detect Android malware in a semi-supervised manner. MalGAE represents each Android application with an attributed function call graph (AFCG) to benefit the ability of graphs to model complex relationships between data. It builds a deep one-class classifier by training a stacked graph autoencoder with graph convolutional layers on benign AFCGs. Experimental results show that MalGAE can achieve good detection performance in terms of different evaluation measures. https://www.isecure-journal.com/article\_159681.html

A Novel Feature Representation for Malware Classification

arXiv (Cornell University), 2022

In this study we present a novel representation for features of malicious programs. This representation is based on hashes of data dependency graphs, which are directly tied to both the structure and operational semantics of a program. We present a comparison with existing term frequency representations and show an increase in accuracy and robustness. Existing methods of deep learning are often based on tf −idf feature representations, and a more robust feature representation enables better classification and pattern recognition.