Detecting Internet Abuse by Analyzing Passive DNS Traffic: A Survey of Implemented Systems (original) (raw)

Analysis and Investigation of Malicious DNS Queries Using CIRA-CIC-DoHBrw-2020 Dataset

Domain Name System (DNS) is one of the earliest vulnerable network protocols with various security gaps that have been exploited repeatedly over the last decades. DNS abuse is one of the most challenging threats for cybersecurity specialists. However, providing secure DNS is still a big challenging mission as attackers use complicated methodologies to inject malicious code in DNS inquiries. Many researchers have explored different machine learning (ML) techniques to encounter this challenge. However, there are still several challenges and barriers to utilizing ML. This paper introduces a systematic approach for identifying malicious and encrypted DNS queries by examining the network traffic and deriving statistical characteristics. Afterward, implementing several ML methods:

Leveraging client-side DNS failure patterns to identify malicious behaviors

2015 IEEE Conference on Communications and Network Security (CNS), 2015

DNS has been increasingly abused by adversaries for cyber-attacks. Recent research has leveraged DNS failures (i.e. DNS queries that result in a Non-Existent-Domain response from the server) to identify malware activities, especially domainflux botnets that generate many random domains as a rendezvous technique for command-&-control. Using ISP network traces, we conduct a systematic analysis of DNS failure characteristics, with the goal of uncovering how attackers exploit DNS for malicious activities. In addition to DNS failures generated by domain-flux bots, we discover many diverse and stealthy failure patterns that have received little attention. Based on these findings, we present a framework that detects diverse clusters of suspicious domain names that cause DNS failures, by considering multiple types of syntactic as well as temporal patterns. Our evolutionary learning framework evaluates the clusters produced over time to eliminate spurious cases while retaining sustaining (i.e., highly suspicious) clusters. One of the advantages of our framework is in analyzing DNS failures on per-client basis and not hinging on the existence of multiple clients infected by the same malware. Our evaluation on a large ISP network trace shows that our framework detects at least 97% of the clients with suspicious DNS behaviors, with over 81% precision.

DomainProfiler: toward accurate and early discovery of domain names abused in future

International Journal of Information Security

Domain names are at the base of today's cyber-attacks. Attackers abuse the domain name system (DNS) to mystify their attack ecosystems; they systematically generate a huge volume of distinct domain names to make it infeasible for blacklisting approaches to keep up with newly generated malicious domain names. To solve this problem, we propose DomainProfiler for discovering malicious domain names that are likely to be abused in future. The key idea with our system is to exploit temporal variation patterns (TVPs) of domain names. The TVPs of domain names include information about how and when a domain name has been listed in legitimate/popular and/or malicious domain name lists. On the basis of this idea, our system actively collects historical DNS logs, analyzes their TVPs, and predicts whether a given domain name will be used for malicious purposes. Our evaluation revealed that DomainProfiler can predict malicious domain names 220 days beforehand with a true positive rate of 0.985. Moreover, we verified the effectiveness of our system in terms of the benefits from our TVPs and defense against cyber-attacks. Keywords Network-level security and protection • Domain name • DNS • Malware • Temporal variation pattern This paper is the extended version of the paper presented at IEEE/IFIP DSN 2016 [15].

REMaDD: Resource-Efficient Malicious Domains Detector in Large-Scale Networks

IEEE Access, 2020

Detecting malicious activities in cyber systems is a major challenge of cybersecurity service providers. Due to the large amount of network traffic, it is often likened to finding a needle in a haystack. Domain name system (DNS) is one of the fundamental protocols of the internet, and therefore it can give a broad view of those malicious activities, which abuse it and leave fingerprints as part of their attack vector. In this collaborative research between Ben-Gurion University, and IBM, a significant performance improvement was achieved in detecting malicious domains as compared to the state-of-the-art software solutions. Specifically, we establish a novel algorithm to detect malicious domains in large-scale DNS traffic, named Resource-Efficient Malicious Domain Detector (REMaDD), with the following desired properties. First, the algorithm does not require prior knowledge on historical malicious activities in its real-time operations. Second, the development used real live streaming data from The Inter-University Computation Center (IUCC), and operated on real-time IBM system. The algorithm is highly computational efficient and satisfies real-time requirements in terms of running time and computational complexity. REMaDD demonstrated strong performance in terms of both detection accuracy and computational efficiency as compared to existing algorithms. Specifically, experimental results on IBM production environment demonstrated that REMaDD achieved 89.4% Precision score, and 82.9% Recall score. By contrast, the DomainObserver, and LSTM.MI algorithms achieved only 76.7%, 67.2% Precision score, and 81.7%, 75.3% Recall score, respectively. INDEX TERMS Cyber security, Domain name system (DNS), Detection algorithms, Real-time algorithms.

Feature Engineering and Machine Learning Model Comparison for Malicious Activity Detection in the DNS-Over-HTTPS Protocol

IEEE Access, 2021

This work was supported by the state of Colorado through funds appropriated for cybersecurity law dubbed ''Cyber Coding Cryptology for State Records.'' ABSTRACT The Domain Name System (DNS) is among the most ubiquitous and important protocols for network communication; however, security concerns regarding DNS have been on the rise and demand for encrypted traffic has followed suit. Using a publicly available dataset, this work compares 10 different machine learning classifiers using stratified 10-fold cross-validation. The classifiers are used to determine the most effective and efficient way of detecting malicious DNS over Hypertext Transfer Protocol Secure (HTTPS) traffic, dubbed DoH traffic. Model performance is evaluated on Non-DoH vs. DoH traffic, then tested on benign vs. malicious DoH traffic. Additionally, this paper seeks to build upon existing research by removing noise and introducing feature selection methods and feature explainability to produce a better model for real-world deployment. After eliminating five overfitting features, our findings indicate that light gradient boosting machine (LGBM) yielded the highest accuracy to training time ratio while approaching 0% error using 20 top features.

Mining IP to Domain Name Interactions to Detect DNS Flood Attacks on Recursive DNS Servers

Sensors, 2016

The Domain Name System (DNS) is a critical infrastructure of any network, and, not surprisingly a common target of cybercrime. There are numerous works that analyse higher level DNS traffic to detect anomalies in the DNS or any other network service. By contrast, few efforts have been made to study and protect the recursive DNS level. In this paper, we introduce a novel abstraction of the recursive DNS traffic to detect a flooding attack, a kind of Distributed Denial of Service (DDoS). The crux of our abstraction lies on a simple observation: Recursive DNS queries, from IP addresses to domain names, form social groups; hence, a DDoS attack should result in drastic changes on DNS social structure. We have built an anomaly-based detection mechanism, which, given a time window of DNS usage, makes use of features that attempt to capture the DNS social structure, including a heuristic that estimates group composition. Our detection mechanism has been successfully validated (in a simulated and controlled setting) and with it the suitability of our abstraction to detect flooding attacks. To the best of our knowledge, this is the first time that work is successful in using this abstraction to detect these kinds of attacks at the recursive level. Before concluding the paper, we motivate further research directions considering this new abstraction, so we have designed and tested two additional experiments which exhibit promising results to detect other types of anomalies in recursive DNS servers.

Detecting Malicious Activity With DNS Backscatter Over Time

IEEE/ACM Transactions on Networking, 2017

Network-wide activity is when one computer (the originator) touches many others (the targets). Motives for activity may be benign (mailing lists, content-delivery networks, and research scanning), malicious (spammers and scanners for security vulnerabilities), or perhaps indeterminate (ad trackers). Knowledge of malicious activity may help anticipate attacks, and understanding benign activity may set a baseline or characterize growth. This paper identifies domain name system (DNS) backscatter as a new source of information about networkwide activity. Backscatter is the reverse DNS queries caused when targets or middleboxes automatically look up the domain name of the originator. Queries are visible to the authoritative DNS servers that handle reverse DNS. While the fraction of backscatter they see depends on the server's location in the DNS hierarchy, we show that activity that touches many targets appear even in sampled observations. We use information about the queriers to classify originator activity using machine-learning. Our algorithm has reasonable accuracy and precision (70-80%) as shown by data from three different organizations operating DNS servers at the root or country level. Using this technique, we examine nine months of activity from one authority to identify trends in scanning, identifying bursts corresponding to Heartbleed, and broad and continuous scanning of secure shell.

Detecting Malicious DNS over HTTPS Traffic in Domain Name System using Machine Learning Classifiers

2020

This paper presents a systematic two-layer approach for detecting DNS over HTTPS (DoH) traffic and distinguishing Benign-DoH traffic from Malicious-DoH traffic using six machine learning algorithms. The capability of machine learning classifiers is evaluated considering their accuracy, precision, recall, and F-score, confusion matrices, ROC curves, and feature importance. The results show that LGBM and XGBoost algorithms outperform the other algorithms in almost all the classification metrics reaching the maximum accuracy of 100% in the classification tasks of layers 1 and 2. LGBM algorithms only misclassified one DoH traffic test as non-DoH out of 4000 test datasets. It has also found that out of 34 features extracted from the CIRA-CIC-DoHBrw-2020 dataset, SourceIP is the critical feature for classifying DoH traffic from non-DoH traffic in layer one followed by DestinationIP feature. However, only DestinationIP is an important feature for LGBM and gradient boosting algorithms when classifying Benign-DoH from Malicious-DoH traffic in layer 2.

Mining agile DNS traffic using graph analysis for cybercrime detection

Computer Networks, 2016

We consider the analysis of network traffic data for identifying highly agile DNS patterns which are widely considered indicative for cybercrime. In contrast to related approaches, our methodology is capable of explicitly distinguishing between the individual, inherent agility of benign Internet services and criminal sites. Although some benign services use a large number of addresses, they are confined to a subset of IP addresses, due to operational requirements and contractual agreements with certain Content Distribution Networks. We discuss DNSMap, a system which analyzes observed DNS traffic, and continuously learns which FQDNs are hosted on which IP addresses. Any significant changes over time are mapped to bipartite graphs, which are then further pruned for cybercrime activity. Graph analysis enables the detection of transitive relations between FQDNs and IPs, and reveals clusters of malicious FQDNs and IP addresses hosting them. We developed a prototype system which is designed for realtime analysis, requires no costly classifier retraining, and no excessive whitelisting. We evaluate our system using large data sets from an ISP with several 10 0,0 0 0 customers, and demonstrate that even moderately agile criminal sites can be detected reliably and almost immediately.

Hybrid rule-based botnet detection approach using machine learning for analysing DNS traffic

2021

Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the ‘Domain Name System’ (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet’s DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-base...