ReFSM: Reverse engineering from protocol packet traces to test generation by extended finite state machines (original) (raw)

Reverse Engineering of Protocols from Network Traces

homepages.di.fc.ul.pt

Communication protocols determine how network components interact with each other. Therefore, the ability to derive a specification of a protocol can be useful in various contexts, such as to support deeper black-box testing or effective defense mechanisms. Unfortunately, it is often hard to obtain the specification because systems implement closed (i.e., undocumented) protocols, or because a time consuming translation has to be performed, from the textual description of the protocol to a format readable by the tools. To address these issues, we propose a new methodology to automatically infer a specification of a protocol from network traces, which generates automata for the protocol language and state machine. Since our solution only resorts to interaction samples of the protocol, it is well-suited to uncover the message formats and protocol states of closed protocols and also to automate most of the process of specifying open protocols. The approach was implemented in a tool and experimentally evaluated with publicly available FTP traces. Our results show that the inferred specification is a good approximation of the reference specification, exhibiting a high level of precision and recall.

A Survey of Automatic Protocol Reverse Engineering Approaches, Methods, and Tools on the Inputs and Outputs View

Security and Communication Networks

A network protocol defines rules that control communications between two or more machines on the Internet, whereas Automatic Protocol Reverse Engineering (APRE) defines the way of extracting the structure of a network protocol without accessing its specifications. Enough knowledge on undocumented protocols is essential for security purposes, network policy implementation, and management of network resources. This paper reviews and analyzes a total of 39 approaches, methods, and tools towards Protocol Reverse Engineering (PRE) and classifies them into four divisions, approaches that reverse engineer protocol finite state machines, protocol formats, and both protocol finite state machines and protocol formats to approaches that focus directly on neither reverse engineering protocol formats nor protocol finite state machines. The efficiency of all approaches’ outputs based on their selected inputs is analyzed in general along with appropriate reverse engineering inputs format. Addition...

ReverX: Reverse engineering of protocols

2011

Communication protocols determine how network components interact with each other. Therefore, the ability to derive a specication of a protocol can be useful in various contexts, such as to support deeper black-box testing or eective defense mechanisms. Unfortunately, it is often hard to obtain the specication because systems implement closed (i.e., undocumented) protocols, or because a time consuming translation has to be performed, from the textual description of the protocol to a format readable by the tools. To address these issues, we propose a new methodology to automatically infer a specication of a protocol from network traces, which generates automata for the protocol language and state machine. Since our solution only resorts to interaction samples of the protocol, it is well-suited to uncover the message formats and protocol states of closed protocols and also to automate most of the process of specifying open protocols. The approach was implemented in ReverX and experimentally evaluated with publicly available FTP traces. Our results show that the inferred specication is a good approximation of the reference specication, exhibiting a high level of precision and recall.

Automatic Protocol Reverse-Engineering: Message Format Extraction and Field Semantics Inference

Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of nefarious activity. However, the C&C protocols of botnets, similar to many other application layer protocols, are undocumented. Automatic protocol reverse-engineering techniques enable understanding undocumented protocols and are important for many security applications, including the analysis and defense against botnets. For example, they enable active botnet infiltration, where a security analyst rewrites messages sent and received by a bot in order to contain malicious activity and to provide the botmaster with an illusion of successful and unhampered operation.

State of the art of network protocol reverse engineering tools

Journal of Computer Virology and Hacking Techniques, 2017

Communication protocols enable structured information exchanges between different entities. A description, at different levels of detail, is necessary for many applications, such as interoperability or security audits. When such a description is not available, one can resort to protocol reverse engineering to infer the format of the messages exchanges or of a model of the protocol. During the past 12 years, several tools have been developed in order to automate, entirely or partially, the protocol inference process. Each of those tools has been developed with a specific application goal for the inferred model, leading to specific needs, and thus different strengths and limitations. After identifying key challenges, the paper presents a survey of protocol reverse engineering tools developed in the last decade. We consider tools focusing on the inference of the format of individual messages or of the grammar of sequences of messages. Finally, we propose a classification of these tools according to different criteria, that is aimed at providing relevant insights about the techniques used by each of these tools and comparatively to other tools, for the classification of messages, the inference of their format or of the grammar of the protocol. This classification also permits to identify technical areas that

Automatically complementing protocol specifications from network traces

Proceedings of the 13th European Workshop on …, 2011

Network servers can be tested for correctness by resorting to a specification of the implemented protocol. However, producing a protocol specification can be a time consuming task. In addition, protocols are constantly evolving with new functionality and message formats that render the previously defined specifications incomplete or deprecated. This paper presents a methodology to automatically complement an existing specification with extensions to the protocol by analyzing the contents of the messages in network traces. The approach can be used on top of existing protocol reverse engineering techniques allowing it to be applied to both open and closed protocols. This approach also has the advantage of capturing unpublished or undocumented features automatically, thus obtaining a more complete and realistic specification of the implemented protocol. The proposed solution was evaluated with a prototype tool that was able to complement an IETF protocol (FTP) specification with several extensions extracted from traffic data collected in 320 public servers.

Survey on network protocol reverse engineering approaches, methods and tools

2017

A network protocol defines rules that control communications between two or more hosts on the Internet, whereas Protocol Reverse Engineering (PRE) defines the process of extracting the structure, attributes and data from a network protocol. Enough knowledge on protocol specifications is essential for security purposes, network policy implementation and management of network resources. Protocol Reverse Engineering is a complex process intended to uncover specifications of unknown protocols. The complexity of PRE, in terms of time consumption, tediousness and error-prone, has led to short and diverse outcomes of Protocols Reverse Engineering approaches. This paper, surveys outputs of 9 PRE approaches in three divisions with methodology analysis and its possible applications. Moreover, in the introductory part we provide a general PRE literature in great depth.

Building an Automaton Towards Reverse Protocol Engineering

2009

Abstract. The communication between computer systems is dictated by network protocols, which determine how the network components interact with each other. Knowing the specification of a network protocol can greatly improve the security and dependability of both the design of the protocol and the applications implementing it.

Polyglot: automatic extraction of protocol message format using dynamic binary analysis

2007

Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic binary analysis and is based on a unique intuition-the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.

Automatic executable test case generation for extended finite state machine protocols

Testing of Communicating Systems, 1997

This paper presents a method for automatic executable test case and test sequence generation which combines both control and data flow testing techniques. Compared to published methods, we use an early executability verification mechanism to reduce significantly the number of discarded paths. A heuristic which uses cycle analysis is used to handle the executability problem. This heuristic can be applied even in the presence of unbounded loops in the specification. Later, the generated paths are completed by postambles and their executability is re-verified. The final executable paths are evaluated symbolically and used for conformance testing purposes.