PIndroid: a novel android malware detection system using ensemble learning methods (original) (raw)
Related papers
High Accuracy Android Malware Detection Using Ensemble Learning
With over 50 billion downloads and more than 1.3 million apps in Google’s official market, Android has continued to gain popularity amongst smartphone users worldwide. At the same time there has been a rise in malware targeting the platform, with more recent strains employing highly sophisticated detection avoidance techniques. As traditional signature based methods become less potent in detecting unknown malware, alternatives are needed for timely zero-day discovery. Thus this paper proposes an approach that utilizes ensemble learning for Android malware detection. It combines advantages of static analysis with the efficiency and performance of ensemble machine learning to improve Android malware detection accuracy. The machine learning models are built using a large repository of malware samples and benign apps from a leading antivirus vendor. Experimental results and analysis presented shows that the proposed method which uses a large feature space to leverage the power of ensemble learning is capable of 97.3 % to 99% detection accuracy with very low false positive rates.
Optimizing Android Malware Detection Via Ensemble Learning
International Journal of Interactive Mobile Technologies (iJIM)
Android operating system has become very popular, with the highest market share, amongst all other mobile operating systems due to its open source nature and users friendliness. This has brought about an uncontrolled rise in malicious applications targeting the Android platform. Emerging trends of Android malware are employing highly sophisticated detection and analysis avoidance techniques such that the traditional signature-based detection methods have become less potent in their ability to detect new and unknown malware. Alternative approaches, such as the Machine learning techniques have taken the lead for timely zero-day anomaly detections. The study aimed at developing an optimized Android malware detection model using ensemble learning technique. Random Forest, Support Vector Machine, and k-Nearest Neighbours were used to develop three distinct base models and their predictive results were further combined using Majority Vote combination function to produce an ensemble model...
Android Malware Detection System Based on Ensemble Learning
The rapid advancement of smartphones, as well as their widespread use, has resulted in a significant increase in new security concerns. Malware’s covert techniques make signature-based anti-virus/anti-malware solutions difficult to detect. The features used in such solutions are extracted from static or dynamic analysis. In this paper, an Android malware detection system has been proposed. It consists of two main subsystems that work in parallel, one has been trained for benign labeled apps while the second one has been trained on malware labeled apps. Each subsystem is based on an ensemble approach that consists of OC-SVM, LOF, and modified isolation forest (M-iForest) classifiers. Each subsystem used three one-class classifiers to take the decision in each subsystem independently. Moreover, each subsystem used both features that are extracted from static and dynamic malware analysis. The evaluation has been conducted based on two An-droid malware benchmark datasets which are DREBI...
Evaluation of Advanced Ensemble Learning Techniques for Android Malware Detection
Vietnam Journal of Computer Science
Android is the most well-known portable working framework having billions of dynamic clients worldwide that pulled in promoters, programmers, and cybercriminals to create malware for different purposes. As of late, wide-running inquiries have been led on malware examination and identification for Android gadgets while Android has likewise actualized different security controls to manage the malware issues, including a User ID (UID) for every application, framework authorizations. In this paper, we advance and assess various kinds of machine learning (ML) by applying ensemble-based learning systems for identifying Android malware related to a substring-based feature selection (SBFS) strategy for the classifiers. In the investigation, we have broadened our previous work where it has been seen that the ensemble-based learning techniques acquire preferred outcome over the recently revealed outcome by directing the DREBIN dataset, and in this manner they give a solid premise to building ...
Machine Learning-Based Android Malware Detection Using Manifest Permissions
Proceedings of the Annual Hawaii International Conference on System Sciences, 2021
The Android operating system is currently the most prevalent mobile device operating system holding roughly 54 percent of the total global market share. Due to Android's substantial presence, it has gained the attention of those with malicious intent, namely, malware authors. As such, there exists a need for validating and improving current malware detection techniques. Automated detection methods such as anti-virus programs are critical in protecting the wide variety of Android-powered mobile devices on the market. This research investigates effectiveness of four different machine learning algorithms in conjunction with features selected from Android manifest file permissions to classify applications as malicious or benign. Case study results, on a test set consisting of 5,243 samples, produce accuracy, recall, and precision rates above 80%. Of the considered algorithms (Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and K-Means), Random Forest performed the best with 82.5% precision and 81.5% accuracy. authors. In recent years, Android-powered devices have become increasingly targeted due in part to their increased use for business and financial tasks. Apps now routinely process sensitive financial and personal information as part of mobile banking, social media, and communication programs. Norton Anti-virus (AV) defines malware as "software that is specifically designed to gain access to or damage a computer, usually without the knowledge of the owner" [3]. Norton further delineates types of malware as spyware, ransomware, viruses, worms, Trojan horses, and adware. In 2017, Kaspersky Labs reported the detection of 5,730,916 malicious installation packages, 94,368 mobile banking Trojans, and 544,107 mobile ransomware Trojans [4]. As such, it can be said that there exists a strong need for accurate and reliable commercial anti-virus (AV) tools in the Android environment and that malware in mobile devices can be a substantial threat [5]. While academicians are interested in detecting malicious activity [17,30-31], opportunities abound to improve Android malware detection accuracy in commercial AV. Zhou and Jiang [7] evaluated Android malware detection using the following antivirus programs: AVG Antivirus Free v2.9 (AVG), Lookout Security & Antivirus v6.9 (or Lookout), Norton Mobile Security Lite v2.5.0.379 (Norton), and TrendMicro Mobile Security Personal Edition v2.0.0.1294 (TrendMicro). The anti-virus programs were used to scan separate devices afflicted with 1,260 samples of malware. Of the 1,260 samples, AVG was able to detect 689 samples (54.7%), Lookout 1,003 samples (79.6%), Norton 254 samples
IMIAD: Intelligent Malware Identification for Android Platform
2019 International Conference on Computer and Information Sciences (ICCIS), 2019
Android malware applications and their detection have been under study by security experts for quite some time, but it gained special attention since the evergrowing use of smartphones. Normally, two methods have been commonly used for their identification. One, in which the code and information flow are analyzed is called the static analysis, whereas, in dynamic analysis, malware behaviour is over served at runtime (by executing it in a sandbox environment). It has been observed that both techniques when used separately, fail to identify all the malware, and, an analysis based on this, fail to achieve good accuracy. There is a need to make use of both these strategies for malware identification, so, if any malignant application identification fails during the static analysis, it gets caught during the dynamic one. Though researchers have used a combination of these two approaches and proposed different malware detection strategies, however, to the best of our knowledge none of them has examined the consent model associated with the applications intent in combination with others. Keeping this observation in mind, our proposed technique is a hybrid approach, based on applications intent, its permissions, static and dynamic data. Our supervised learning-based approach results have shown m 96% accuracy in detecting malware applications using gradient boosting classifier
Empirical Study on Intelligent Android Malware Detection based on Supervised Machine Learning
International Journal of Advanced Computer Science and Applications, 2020
The increasing number of mobile devices using the Android operating system in the market makes these devices the first target for malicious applications. In recent years, several Android malware applications were developed to perform certain illegitimate activities and harmful actions on mobile devices. In response, specific tools and anti-virus programs used conventional signature-based methods in order to detect such Android malware applications. However, the most recent Android malware apps, such as zero-day, cannot be detected through conventional methods that are still based on fixed signatures or identifiers. Therefore, the most recently published research studies have suggested machine learning techniques as an alternative method to detect Android malware due to their ability to learn and use the existing information to detect the new Android malware apps. This paper presents the basic concepts of Android architecture, Android malware, and permission features utilized as effective malware predictors. Furthermore, a comprehensive review of the existing static, dynamic, and hybrid Android malware detection approaches is presented in this study. More significantly, this paper empirically discusses and compares the performances of six supervised machine learning algorithms, known as K-Nearest Neighbors (K-NN), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB), and Logistic Regression (LR), which are commonly used in the literature for detecting malware apps.
DroidNMD: Network-based Malware Detection in Android Using an Ensemble of One-Class Classifiers
Modares Journal of Electrical Engineering, 2016
During the past few years, the number of malware designed for Android devices has increased dramatically. To confront with Android malware, some anomaly detection techniques have been proposed that are able to detect zero-day malware, but they often produce many false alarms that make them impractical for real-world use. In this paper, we address this problem by presenting DroidNMD, an ensemble-based anomaly detection technique that focuses on the network behavior of Android applications in order to detect Android malware. DroidNMD constructs an ensemble classifier consisting of multiple heterogeneous oneclass classifiers and uses an ordered weighted averaging (OWA) operator to aggregate the outputs of the one-class classifiers. Our work is motivated by the observation that combining multiple oneclass classifiers often produces higher overall classification accuracy than any individual one-class classifier. We demonstrate the effectiveness of DroidNMD using a real dataset of Android...
Android Malware Classification Using Optimized Ensemble Learning Based on Genetic Algorithms
Sustainability
The continuous increase in Android malware applications (apps) represents a significant danger to the privacy and security of users’ information. Therefore, effective and efficient Android malware app-classification techniques are needed. This paper presents a method for Android malware classification using optimized ensemble learning based on genetic algorithms. The suggested method is divided into two steps. First, a base learner is used to handle various machine learning algorithms, including support vector machine (SVM), logistic regression (LR), gradient boosting (GB), decision tree (DT), and AdaBoost (ADA) classifiers. Second, a meta learner RF-GA, utilizing genetic algorithm (GA) to optimize the parameters of a random forest (RF) algorithm, is employed to classify the prediction probabilities from the base learner. The genetic algorithm is used to optimize the parameter settings in the RF algorithm in order to obtain the highest Android malware classification accuracy. The ef...
AdDroid: Rule-Based Machine Learning Framework for Android Malware Analysis
Mobile Networks and Applications, 2019
Recent years have witnessed huge growth in Android malware development. Colossal reliance on Android applications for day to day working and their massive development dictates for an automated mechanism to distinguish malicious applications from benign ones. A significant amount of research has been devoted to analyzing and mitigating this growing problem; however, attackers are using more complicated techniques to evade detection. This paper proposes a framework, AdDroid; for analyzing and detecting malicious behaviour in Android applications based on various combinations of artefacts called Rules. The artefacts represent actions of an Android application such as connecting to the Internet, uploading a file to a remote server or installing another package on the device etc. AdDroid employs an ensemble-based machine learning technique where Adaboost is combined with traditional classifiers in order to train a model founded on static analysis of Android applications that is capable of recognizing malicious applications. Feature selection and extraction techniques are used to get the most distinguishing Rules. The proposed model is created using a dataset comprising 1420 Android applications with 910 malicious and 510 benign applications. Our proposed system achieved an accuracy of 99.11% with 98.61% True Positive (TP) and 99.33% True Negative (TN) rate. The high TP and TN rates reflect the efficacy on both major and minor class. Since the proposed solution has exceptionally low computational complexity, therefore, making it possible to analyze applications in real-time.