Wenke Lee | Georgia Institute of Technology (original) (raw)

Papers by Wenke Lee

Botnets are now recognized as one of the most serious security threats. In contrast to previous m... more Botnets are now recognized as one of the most serious security threats. In contrast to previous malware, botnets have the characteristic of a command and control (C&C) channel. Botnets also often use existing common protocols, e.g., IRC, HTTP, and in protocol-conforming manners. This makes the detection of botnet C&C a challenging problem. In this paper, we propose an approach that uses network-based anomaly detection to identify botnet C&C channels in a local area network without any prior knowledge of signatures or C&C server addresses. This detection approach can identify both the C&C servers and infected hosts in the network. Our approach is based on the observation that, because of the pre-programmed activities related to C&C, bots within the same botnet will likely demonstrate spatial-temporal correlation and similarity. For example, they engage in coordinated communication, propagation, and attack and fraudulent activities. Our prototype system, BotSniffer, can capture this spatial-temporal correlation in network traffic and utilize statistical algorithms to detect botnets with theoretical bounds on the false positive and false negative rates. We evaluated BotSniffer using many real-world network traces. The results show that BotSniffer can detect real-world botnets with high accuracy and has a very low false positive rate.

networked systems design and implementation, Apr 28, 2010

We present a novel network-level behavioral malware clustering system. We focus on analyzing the ... more We present a novel network-level behavioral malware clustering system. We focus on analyzing the structural similarities among malicious HTTP traffic traces generated by executing HTTP-based malware. Our work is motivated by the need to provide quality input to algorithms that automatically generate network signatures. Accordingly, we define similarity metrics among HTTP traces and develop our system so that the resulting clusters can yield high-quality malware signatures. We implemented a proof-of-concept version of our network-level malware clustering system and performed experiments with more than 25,000 distinct malware samples. Results from our evaluation, which includes real-world deployment, confirm the effectiveness of the proposed clustering system and show that our approach can aid the process of automatically extracting network signatures for detecting HTTP traffic generated by malware-compromised machines.

There is often the need to update an installed Intrusion Detection System (IDS) due to new attack... more There is often the need to update an installed Intrusion Detection System (IDS) due to new attack methods or upgraded computing environments. Since many current IDSs are constructed by manual encoding of expert security knowledge, changes to IDSs are expensive and require many hours of programming and debugging. We describe a data mining framework for adaptively building Intrusion Detection (ID) models specifically for the use of in Network Flight Recorder (NFR) [10]. The central idea is to utilize auditing programs to extract an extensive set of features that describe each network connection or host session, and apply data mining programs to learn rules that accurately capture the behavior of intrusions and normal activities. These rules can then be used for misuse detection and anomaly detection. Detection models are then incorporated into NFR through a machine translator, which produces a working detection model in the form of N-Code, NFR’s powerful filtering language.

In this paper we discuss our research in developing general and systematic methods for intrusion ... more In this paper we discuss our research in developing general and systematic methods for intrusion detection. The key ideas are to use data mining techniques to discover consistent and useful patterns of system features that describe program and user behavior, and use the set of relevant system features to compute (inductively learned) classifiers that can recognize anomalies and known intrusions. Using experiments on the sendmail system call data and the network tcpdump data, we demonstrate that we can construct concise and accurate classifiers to detect anomalies. We provide an overview on two general data mining algorithms that we have implemented: the association rules algorithm and the frequent episodes algorithm. These algorithms can be used to compute the intra-and inter-audit record patterns, which are essential in describing program or user behavior. The discovered patterns can guide the audit data gathering process and facilitate feature selection. To meet the challenges of both efficient learning (mining) and real-time detection, we propose an agent-based architecture for intrusion detection systems where the learning agents continuously compute and provide the updated (detection) models to the detection agents.

Journal of Computer Security, 2002

Intrusion detection systems (IDSs) must maximize the realization of security goals while minimizi... more Intrusion detection systems (IDSs) must maximize the realization of security goals while minimizing costs. In this paper, we study the problem of building cost-sensitive intrusion detection models. We examine the major cost factors associated with an IDS, which include development cost, operational cost, damage cost due to successful intrusions, and the cost of manual and automated response to intrusions. These cost factors can be qualified according to a defined attack taxonomy and site-specific security policies and priorities. We define cost models to formulate the total expected cost of an IDS. We present cost-sensitive machine learning techniques that can produce detection models that are optimized for user-defined cost metrics. Empirical experiments show that our cost-sensitive modeling and deployment techniques are effective in reducing the overall cost of intrusion detection.

Proceedings of the 1st ACM workshop on Security of ad hoc and sensor networks - SASN '03, 2003

Mobile ad hoc networking (MANET) has become an exciting and important technology in recent years ... more Mobile ad hoc networking (MANET) has become an exciting and important technology in recent years because of the rapid proliferation of wireless devices. MANETs are highly vulnerable to attacks due to the open medium, dynamically changing network topology, cooperative algorithms, lack of centralized monitoring and management point, and lack of a clear line of defense. In this paper, we report our progress in developing intrusion detection (ID) capabilities for MANET. Building on our prior work on anomaly detection, we investigate how to improve the anomaly detection approach to provide more details on attack types and sources. For several well-known attacks, we can apply a simple rule to identify the attack type when an anomaly is reported. In some cases, these rules can also help identify the attackers. We address the run-time resource constraint problem using a cluster-based detection scheme where periodically a node is elected as the ID agent for a cluster. Compared with the scheme where each node is its own ID agent, this scheme is much more efficient while maintaining the same level of effectiveness. We have conducted extensive experiments using the ns-2 and MobiEmu environments to validate our research.

Lecture Notes in Computer Science, 2004

Attack analysis is a challenging problem, especially in emerging environments where there are few... more Attack analysis is a challenging problem, especially in emerging environments where there are few known attack cases. One such new environment is the Mobile Ad hoc Network (MANET). In this paper, we present a systematic approach to analyze attacks. We introduce the concept of basic events. An attack can be decomposed into certain combinations of basic events. We then define a taxonomy of anomalous basic events by analyzing the basic security goals. Attack analysis provides a basis for designing detection models. We use both specification-based and statistical-based approaches. First, normal basic events of the protocol can be modeled by an extended finite state automaton (EFSA) according to the protocol specifications. The EFSA can detect anomalous basic events that are direct violations of the specifications. Statistical learning algorithms, with statistical features, i.e., statistics on the states and transitions of the EFSA, can train an effective detection model to detect those anomalous basic events that are temporal and statistical in nature. We use the AODV routing protocol as a case study to validate our research. Our experiments on the MobiEmu wireless emulation platform show that our specification-based and statistical-based models cover most of the anomalous basic events in our taxonomy.

First International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM'05)

Developing and evaluating secure MANET (mobile adhoc networks) in real systems is a complex proce... more Developing and evaluating secure MANET (mobile adhoc networks) in real systems is a complex process that involves careful design of attack test cases and security countermeasures, as well as meaningful performance measurements to evaluate both the impact of attacks and the performance of security solutions. It is desirable to have a development and testing environment that can automate this process. In this paper, we propose a software framework for such an environment and describe a system implementation in the secure MANET routing domain. This environment includes the following three major features. First, the environment is built upon a wireless network emulation tool to support repeatable experimentation. Second, it adds an attack emulation layer with necessary API for easy development and execution of attack test cases. Third, the extensible attack library includes a full set of basic attacks at its core and a way to compose complex attacks from the atomic elements. To demonstrate the usefulness of this tool, we show the development of an Intrusion Detection System (IDS) as a case study. Our successful experience confirms that the platform can greatly facilitate the development of security solutions on MANET.

Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01

We extend prior research on system call anomaly detection modeling methods for intrusion detectio... more We extend prior research on system call anomaly detection modeling methods for intrusion detection by incorporating dynamic window sizes. The window size is the length of the subsequence of a system call trace which is used as the basic unit for modeling program or process behavior. In this work we incorporate dynamic window sizes and show marked improvements in anomaly detection. We present two methods for estimating the optimal window size based on the available training data. The first method is an entropy modeling method which determines the optimal single window size for the data. The second method is a probability modeling method that takes into account context dependent window sizes. A context dependent window size model is motivated by the way that system calls are generated by processes. Sparse Markov transducers (SMTs) are used to compute the context dependent window size model. We show over actual system call traces that the entropy modeling methods lead to the optimal single window size. We also show that context dependent window sizes outperform traditional system call modeling methods.

2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011

23rd International Conference on Distributed Computing Systems, 2003. Proceedings.

With the proliferation of wireless devices, mobile ad hoc networking (MANET) has become a very ex... more With the proliferation of wireless devices, mobile ad hoc networking (MANET) has become a very exciting and important technology due to its characteristics of open medium and dynamic topology among others. However, MANETs are more vulnerable than wired networks. Existing security mechanisms developed for wired networks need be redesigned for MANET. In this paper, we discuss the problem of intrusion detection in MANET. The focus of our research is on techniques for automatically constructing anomaly detection models that are capable of detecting new (or unknown) attacks. We introduce a new data mining method that uses "cross-feature analysis" to capture the inter-feature correlation patterns in normal traffic. These patterns can be used as normal profiles to detect deviation (or anomalies) caused by attacks. We have implemented our method with two well known ad-hoc routing protocols, namely, Dynamic Source Routing (DSR) and Ad-hoc On-Demand Distance Vector (AODV), and have conducted extensive experiments using the ns-2 simulator. The results show that the anomaly detection models automatically computed using our data mining method can effectively detect the anomalies caused by representative intrusions.

Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01

In this paper, we present an overview of our research in real time data mining-based intrusion de... more In this paper, we present an overview of our research in real time data mining-based intrusion detection systems (IDSs). We focus on issues related to deploying a data mining-based IDS in a real time environment. We describe our approaches to address three types of issues: accuracy, efficiency, and usability. To improve accuracy, data mining programs are used to analyze audit data and extract features that can distinguish normal activities from intrusions; we use artificial anomalies along with normal and/or intrusion data to produce more effective misuse and anomaly detection models. To improve efficiency, the computational costs of features are analyzed and a multiple-model costbased approach is used to produce detection models with low cost and high accuracy. We also present a distributed architecture for evaluating cost-sensitive models in realtime. To improve usability, adaptive learning algorithms are used to facilitate model construction and incremental updates; unsupervised anomaly detection algorithms are used to reduce the reliance on labeled data. We also present an architecture consisting of sensors, detectors, a data warehouse, and model generation components. This architecture facilitates the sharing and storage of audit data and the distribution of new or updated models. This architecture also improves the efficiency and scalability of the IDS.

2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007, 2007

The scan-then-exploit propagation strategy is among the most widely used methods by which malware... more The scan-then-exploit propagation strategy is among the most widely used methods by which malware spreads across computer networks. Recently, a new self-learning strategy for selecting target addresses during malware propagation was introduced in [1], which we refer to as importance scanning. Under the importance-scanning approach, malware employs an address sampling scheme to search for the underlying group distribution of (vulnerable) hosts in the address space through which it propagates. The malware utilizes this information to increase the rate at which it locates viable addresses during its search for infection targets. In this paper, we introduce a strategy to combat importance scanning propagation. We propose the use of white hole networks, which combine several existing components to dissuade, slow, and ultimately halt the propagation of importance scanning malware. Based on analytical reasoning and simulations using real trace and address distribution information, we demonstrate how the white hole approach can provide an effective defense, even when the deployment of this countermeasure represents a very small fraction of the address space population.

Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), 2007

We propose a taxonomy of botnet structures, based on their utility to the botmaster. We propose k... more We propose a taxonomy of botnet structures, based on their utility to the botmaster. We propose key metrics to measure their utility for various activities (e.g., spam, ddos). Using these performance metrics, we consider the ability of different response techniques to degrade or disrupt botnets. In particular, our models show that targeted responses are particularly effective against scale free botnets and efforts to increase the robustness of scale free networks comes at a cost of diminished transitivity. Botmasters do not appear to have any structural solutions to this problem in scale free networks. We also show that random graph botnets (e.g., those using P2P formations) are highly resistant to both random and targeted responses. We evaluate the impact of responses on different topologies using simulation and demonstrate the utility of our proposed metrics by performing novel measurements of a P2P network. Our analysis shows how botnets may be classified according to structure and given rank or priority using our proposed metrics. This may help direct responses and suggests which general remediation strategies are more likely to succeed. Major Botnet Utilities Key Metrics Suggested Variables Comment Effectiveness Giant portion S Large numbers of victims increases the likelihood of high-bandwidth bots. Diurnal behavior favors S over total population. Ave. Avail. Bandwidth B Average bandwidth available at any time, because of variations in total victim bandwidth, use by victims, and diurnal changes. Efficiency Diameter l −1 Bots sending messages to each other and coordinating activities require efficient communications. Robustness Local transitivity γ Bots maintaining state (e.g., keycracking or mirroring files) require redundancy to guard against random loss. Highly transitive networks are more robust.

2009 Annual Computer Security Applications Conference, 2009

We consider the problem of identifying obscure chat-like botnet command and control (C&C) communi... more We consider the problem of identifying obscure chat-like botnet command and control (C&C) communications, which are indistinguishable from human-human communication using traditional signature-based techniques. Existing passive-behavior-based anomaly detection techniques are limited because they either require monitoring multiple botinfected machines that belong to the same botnet or require extended monitoring times. In this paper, we explore the potential use of active botnet probing techniques in a network middlebox as a means to augment and complement existing passive botnet C&C detection strategies, especially for small botnets with obfuscated C&C content and infrequent C&C interactions. We present an algorithmic framework that uses hypothesis testing to separate botnet C&C dialogs from humanhuman conversations with desired accuracy and implement a prototype system called BotProbe. Experimental results on multiple real-world IRC bots demonstrate that our proposed active methods can successfully identify obscure and obfuscated botnet communications. A real-world user study on about one hundred participants also shows that the technique has a low false positive rate on human-human conversations. We discuss the limitations of BotProbe and hope this preliminary feasibility study on the use of active techniques in botnet research can inspire new thoughts and directions within the malware research community.

Lecture Notes in Computer Science, 2004

Worm detection systems have traditionally used global strategies and focused on scan rates. The n... more Worm detection systems have traditionally used global strategies and focused on scan rates. The noise associated with this approach requires statistical techniques and large data sets (e.g., 2 20 monitored machines) to yield timely alerts and avoid false positives. Worm detection techniques for smaller local networks have not been fully explored. We consider how local networks can provide early detection and compliment global monitoring strategies. We describe HoneyStat, which uses modified honeypots to generate a highly accurate alert stream with low false positive rates. Unlike traditional highly-interactive honeypots, HoneyStat nodes are script-driven, automated, and cover a large IP space. The HoneyStat nodes generate three classes of alerts: memory alerts (based on buffer overflow detection and process management), disk write alerts (such as writes to registry keys and critical files) and network alerts. Data collection is automated, and once an alert is issued, a time segment of previous traffic to the node is analyzed. A logit analysis determines what previous network activity explains the current honeypot alert. The result can indicate whether an automated or worm attack is present. We demonstrate HoneyStat's improvements over previous worm detection techniques. First, using trace files from worm attacks on small networks, we demonstrate how it detects zero day worms. Second, we show how it detects multi vector worms that use combinations of ports to attack. Third, the alerts from HoneyStat provide more information than traditional IDS alerts, such as binary signatures, attack vectors, and attack rates. We also use extensive (year long) trace files to show how the logit analysis produces very low false positive rates.

Lecture Notes in Computer Science, 2000

As the recent distributed Denial-of-Service (DDOS) attacks on several major Internet sites have s... more As the recent distributed Denial-of-Service (DDOS) attacks on several major Internet sites have shown us, no open computer network is immune from intrusions. Furthermore, intrusion detection systems (IDSs) need to be updated timely whenever a novel intrusion surfaces; and geographically distributed IDSs need to cooperate to detect distributed and coordinated intrusions. In this paper, we describe an experimental system, based on the Common Intrusion Detection Framework (CIDF), where multiple IDSs can exchange attack information to detect distributed intrusions. The system also includes an ID model builder, where a data mining engine can receive audit data of a novel attack from an IDS, compute a new detection model, and then distribute it to other IDSs. We describe our experiences in implementing such system and the preliminary results of deploying the system in an experimental network.

ACM SIGMOD Record, 2001

Intrusion detection is an essential component of computer security mechanisms. It requires accura... more Intrusion detection is an essential component of computer security mechanisms. It requires accurate and efficient analysis of a large amount of system and network audit data. It can thus be an application area of data mining. There are several characteristics of audit data: abundant raw data, rich system and network semantics, and ever "streaming". Accordingly, when developing data mining approaches, we need to focus on: feature extraction and construction, customization of (general) algorithms according to semantic information, and optimization of execution efficiency of the output models. In this paper, we describe a data mining framework for mining audit data for intrusion detection models. We discuss its advantages and limitations, and outline the open research problems.

Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security - ASIACCS '11, 2011

Botnets pose a serious threat to the health of the Internet. Most current network-based botnet de... more Botnets pose a serious threat to the health of the Internet. Most current network-based botnet detection systems require deep packet inspection (DPI) to detect bots. Because DPI is a computational costly process, such detection systems cannot handle large volumes of traffic typical of large enterprise and ISP networks. In this paper we propose a system that aims to efficiently and effectively identify a small number of suspicious hosts that are likely bots. Their traffic can then be forwarded to DPI-based botnet detection systems for fine-grained inspection and accurate botnet detection. By using a novel adaptive packet sampling algorithm and a scalable spatial-temporal flow correlation approach, our system is able to substantially reduce the volume of network traffic that goes through DPI, thereby boosting the scalability of existing botnet detection systems. We implemented a proof-of-concept version of our system, and evaluated it using real-world legitimate and botnet-related network traces. Our experimental results are very promising and suggest that our approach can enable the deployment of botnet-detection systems in large, high-speed networks.

Botnets are now the key platform for many Internet attacks, such as spam, distributed denial-of-s... more Botnets are now the key platform for many Internet attacks, such as spam, distributed denial-of-service (DDoS), identity theft, and phishing. Most of the current botnet detection approaches work only on specific botnet command and control (C&C) protocols (e.g., IRC) and structures (e.g., centralized), and can become ineffective as botnets change their C&C techniques. In this paper, we present a general detection framework that is independent of botnet C&C protocol and structure, and requires no a priori knowledge of botnets (such as captured bot binaries and hence the botnet signatures, and C&C server names/addresses). We start from the definition and essential properties of botnets. We define a botnet as a coordinated group of malware instances that are controlled via C&C communication channels. The essential properties of a botnet are that the bots communicate with some C&C servers/peers, perform malicious activities, and do so in a similar or correlated way. Accordingly, our detection framework clusters similar communication traffic and similar malicious traffic, and performs cross cluster correlation to identify the hosts that share both similar communication patterns and similar malicious activity patterns. These hosts are thus bots in the monitored network. We have implemented our BotMiner prototype system and evaluated it using many real network traces. The results show that it can detect real-world botnets (IRC-based, HTTP-based, and P2P botnets including Nugache and Storm worm), and has a very low false positive rate.

networked systems design and implementation, Apr 28, 2010

Journal of Computer Security, 2002

Proceedings of the 1st ACM workshop on Security of ad hoc and sensor networks - SASN '03, 2003

Lecture Notes in Computer Science, 2004

First International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM'05)

Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01

2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011

23rd International Conference on Distributed Computing Systems, 2003. Proceedings.

Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01

2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007, 2007

Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), 2007

2009 Annual Computer Security Applications Conference, 2009

Lecture Notes in Computer Science, 2004

Lecture Notes in Computer Science, 2000

ACM SIGMOD Record, 2001

Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security - ASIACCS '11, 2011