Large-scale Debloating of Binary Shared Libraries (original) (raw)

Debloating Software through Piece-Wise Compilation and Loading

2018

Programs are bloated. Our study shows that only 5% of libc is used on average across Ubuntu Desktop environment (>2200 programs); the heaviest user, vlc media player, only used 18%. This is striking because bloating presents a vulnerable attack surface for software exploitation and imposes undue burden on defenses (e.g., CFI defenses). In this paper: (1) We present a debloating framework built on a compiler toolchain that can successfully debloat software (shared/static libraries and executables). Our solution can successfully compile and load most libraries on Ubuntu Desktop 16.04. (2) We demonstrate an elimination of over 84% code from coreutils and 85% code from SPEC CPU 2006 benchmark programs without affecting functionality. We show that even complex COTS programs (e.g., FireFox, Curl) can be debloated {without a need to recompile}. (3) We demonstrate the security impact of our system by eliminating over 70% of reusable code gadgets from coreutils suite, and show that unused...

Binary Code Continent: Finer-Grained Control Flow Integrity for Stripped Binaries

Control Flow Integrity (CFI) is an effective technique to mitigate threats such as code-injection and code-reuse attacks in programs by protecting indirect transfers. For stripped binaries, a CFI policy has to be made conservatively due to the lack of source code level semantics. Existing binary-only CFI solutions such as BinCFI and CCFIR demonstrate the ability to protect stripped binaries, but the policies they apply are too permissive, allowing sophisticated code-reuse attacks. In this paper, we propose a new binary-only CFI protection scheme called BinCC, which applies static binary rewriting to provide finer-grained protection for x86 stripped ELF binaries. Through code duplication and static analysis, we divide the binary code into several mutually exclusive code continents. We further classify each indirect transfer within a code continent as either an Intra-Continent transfer or an Inter-Continent transfer, and apply separate, strict CFI polices to constrain these transfers. To evaluate BinCC, we introduce new metrics to estimate the average amount of legitimate targets of each kind of indirect transfer as well as the difficulty to leverage call preceded gadgets to generate ROP exploits. Compared to the state of the art binary-only CFI, BinCFI, the experimental results show that BinCC significantly reduces the legitimate transfer targets by 81.34% and increases the difficulty for adversaries to bypass CFI restriction to launch sophisticated ROP attacks. Also, BinCC achieves a reasonable performance, around 14% of the space overhead decrease and only 4% runtime overhead increase as compared to BinCFI.

Marlin: Mitigating Code Reuse Attacks Using Code Randomization

IEEE Transactions on Dependable and Secure Computing, 2015

Code-reuse attacks, such as return-oriented programming (ROP), are a class of buffer overflow attacks that repurpose existing executable code towards malicious purposes. These attacks bypass defenses against code injection attacks by chaining together sequence of instructions, commonly known as gadgets, to execute the desired attack logic. A common feature of these attacks is the reliance on the knowledge of memory layout of the executable code. We propose a fine grained randomization based approach that breaks these assumptions by modifying the layout of the executable code and hinders code-reuse attack. Our solution, Marlin, randomizes the internal structure of the executable code by randomly shuffling the function blocks in the target binary. This denies the attacker the necessary a priori knowledge of instruction addresses for constructing the desired exploit payload. Our approach can be applied to any ELF binary and every execution of this binary uses a different randomization. We have integrated Marlin into the bash shell that randomizes the target executable before launching it. Our work shows that such an approach incurs low overhead and significantly increases the level of security against code-reuse based attacks.

SoK: Using Dynamic Binary Instrumentation for Security (And How You May Get Caught Red Handed)

Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, 2019

Dynamic binary instrumentation (DBI) techniques allow for monitoring and possibly altering the execution of a running program up to the instruction level granularity. The ease of use and flexibility of DBI primitives has made them popular in a large body of research in different domains, including software security. Lately, the suitability of DBI for security has been questioned in light of transparency concerns from artifacts that popular frameworks introduce in the execution: while they do not perturb benign programs, a dedicated adversary may detect their presence and defeat the analysis. The contributions we provide are two-fold. We first present the abstraction and inner workings of DBI frameworks, how DBI assisted prominent security research works, and alternative solutions. We then dive into the DBI evasion and escape problems, discussing attack surfaces, transparency concerns, and possible mitigations. We make available to the community a library of detection patterns and st...

Track Conventions, Not Attack Signatures: Fortifying X86 ABI and System Call Interfaces to Mitigate Code Reuse Attacks

2021

Code Reuse Attacks (CRAs) are dangerous exploitation strategies that allow attackers to compose malicious programs out of existing application and library code gadgets, without requiring code injection. Previously, researchers explored hardware-assisted protection schemes that track attack signatures to identify malicious behavior. This paper makes two main contributions. First, we show that previously proposed signature-based schemes are impractical because they do not always distinguish attack patterns from the behavior of benign programs. Second, we demonstrate that instead of tracking attack signatures, a more robust defense mechanism is to track legitimate usage of system calls and ABI compliance in hardware, and detect deviations from established conventions as possible attacks. We propose two specific tracking mechanisms: the setting of arguments for system calls and register usage across function calls. We demonstrate that our solution severely hinders practical CRAs and completely stops codereuse execution of sensitive system calls like mprotect. Our solution imposes very low performance overhead and modest design complexity.

LibVM: an architecture for shared library sandboxing

Software: Practice and Experience, 2014

Many software applications extend their functionality by dynamically loading libraries into their allocated address space. However, shared libraries are also often of unknown provenance and quality and may contain accidental bugs or, in some cases, deliberately malicious code. Most sandboxing techniques which address these issues require recompilation of the libraries using custom tool chains, require significant modifications to the libraries, do not retain the benefits of single address-space programming, do not completely isolate guest code, or incur substantial performance overheads. In this paper we present LibVM, a sandboxing architecture for isolating libraries within a host application without requiring any modifications to the shared libraries themselves, while still retaining the benefits of a single address space and also introducing a system call inter-positioning layer that allows complete arbitration over a shared library's functionality. We show how to utilize contemporary hardware virtualization support towards this end with reasonable performance overheads and, in the absence of such hardware support, our model can also be implemented using a software-based mechanism. We ensure that our implementation conforms as closely as possible to existing shared library manipulation functions, minimizing the amount of effort needed to apply such isolation to existing programs. Our experimental results show that it is easy to gain immediate benefits in scenarios where the goal is to guard the host application against unintentional programming errors when using shared libraries, as well as in more complex scenarios, where a shared library is suspected of being actively hostile. In both cases, no changes are required to the shared libraries themselves.

Binary Recompilation via Dynamic Analysis and the Protection of Control and Data-flows Therein

2020

Author(s): Nash, Joseph Michael | Advisor(s): Franz, Michael | Abstract: Legacy binaries need to continue functioning even when no source code has been preserved, to support the workflows of government and industry. The binaries often lack recent improvements in compiler design and software engineering practices, causing them to be slower and less secure than modern binaries. Binary rewriting seeks to patch, optimize, instrument, or harden binaries to bridge this gap, but existing practice is limited by the underlying static analysis. We created a framework, BinRec, to use dynamic analysis to lift binaries to LLVM IR then recompile them, which overcomes the limitations of static analysis.The protection of software against memory corruption exploits has a rich history, which this thesis both systematizes and extends. We present a study of the performance, precision, and security of control-flow integrity (CFI). Data-only attacks can bypass CFI, and so we present a defense against the...

BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features

Applied Sciences

Third-party library (TPL) reuse may introduce vulnerable or malicious code and expose the software, which exposes them to potential risks. Thus, it is essential to identify third-party dependencies and take immediate corrective action to fix critical vulnerabilities when a damaged reusable component is found or reported. However, most of the existing methods only rely on syntactic features, which results in low recognition accuracy and significantly discounts the detection performance by obfuscation techniques. In addition, a few semantic-based approaches face the efficiency problem. To resolve these problems, we propose and implement a more precise and scalable TPL detection method BBDetector. In addition to syntactic features, we consider the rich function-level semantic features and form a feature vector for each function. Moreover, we design a scalable function vector similarity search method to identify anchor functions and the candidate libraries, based upon which we carry out...

Mitigation of Advanced Code Reuse Attacks

2021

The people around the world use more and more software in their daily lives, as such the boundary between digital and human is about to vanish. A natural observation is that the interest of many type of actors (e.g., nation state actors, state sponsored actors, etc.) rises exponentially as this software success story unfolds. As digital data is the new oil of the economy, these actors are driven by many incentives to get to this data. As such, comparable efforts are undertaken to reach the point in which private devices are no longer private but rather controlled and owned by others, in many cases without the user knowledge. This is achieved by performing an attack which uses a certain system weakness such as a program code-based vulnerability. As such, it is important to have a deep understanding of code reuse attacks (CRAs) as these are often used by these actors to reach their above-mentioned goals. Thus, this thesis provides tools and techniques to mitigate these problems by offering approaches to address CRAs along two main lines of research based on static and dynamic code analysis. In the first part of this thesis, we present a static symbolic execution based framework INTDETECT, which can detect integer overflows in C source code programs, as integer overflows often lead to memory corruptions, and even to CRAs. INTDETECT can reliably detect integer overflows and does not suffer from false negatives for the tested programs. We integrate it in the Eclipse IDE which is a well-established and widely used Integrated Development Environment (IDE). Next, we extend the static C source code analysis framework on which INTDETECT relies by implementing an integer overflow detection and repair generation tool, called INTREPAIR, on top of it. INTREPAIR generates C source code repairs that help a programmer to automatically repair a previously detected integer overflow. INTREPAIR can efficiently remove a fault, automatically validate a repair and does not introduce unwanted program behavior. Further, we extend our static source code analysis framework in order to not only detect and repair integer overflows, but also to detect and repair buffer overflows which are in most cases one of the main prerequisites for performing CRAs. For this purpose, we provide a tool, called BUFFREPAIR, which automatically generates buffer overflow repairs, does not suffer from false negatives, and can also validate a repair. In the second part of this thesis, we focus on the detection of dynamic memory corruptions, which most commonly lead to (or are a prerequisite for) CRAs. We are motivated to take this path due to the intrinsic limitations of static analysis techniques used in the first part of this thesis. More precisely, we develop a compiler-based sanitizer tool, called CASTSAN, which detects object type confusions during runtime and which is completely integrated into a well-established compiler framework (i.e., Clang/LLVM). CASTSAN is a fully functional object type confusion detection tool that is based on a novel and efficient technique for detection of only polymorphic C++ object type confusions. Thus, if consistently used it can considerably reduce the likelihood of CRAs. Next, we design a static compiler based tool, named LLVM-CFI, which can assess stateof-the-art static CFI defenses. The intuition behind this decision is twofold: First, due to the fact that currently memory corruptions cannot fully be eradicated from programs, we would like to provide a runtime defense to harden a program. Second, we would like to design and implement a novel CFI-based technique, which will be introduced in the next parts of this thesis, for protecting indirect program control flow transfers. In order to effectively address this task, we first need to learn how effective the existing state-of-the-art CFI defenses are and which level of security they offer. Thus, we develop LLVM-CFI, a novel framework for assessing static CFI policies w.r.t. calltarget set reduction after a certain CFI defense was applied. Further, by using LLVM-CFI, we gain important knowledge which helps us to prepare the next design decisions for the tools presented later in this thesis. Further, we design and implement a compiler-based tool, called ρFEM, which is based on the results from the second part of this thesis. ρFEM protects program CFG backward-edges stemming from indirect and direct forward-edge program control flow transfers. ρFEM is based on a novel technique for protecting program CFG backward-edges relying on a fine-grained CFI policy, which provides an optimal set of return targets for each protected callee. In this way, the likelihood of successfully performing CRAs exploiting backward edges is greatly reduced. At the same time, a solution serving as a competitive alternative for shadow stacks is provided. In contrast to shadow stack techniques, except Intel Control Flow Enforcement (CET) which is based on hardware support or Return Address Defender (RAD) which uses page permissions, ρFEM does not rely on entropy and information hiding. Thus, the corresponding protection disclosing attack vectors which are relevant for shadow-stack techniques do not apply for ρFEM. Finally, in the last part of this thesis, we develop a framework called τCFI for protecting legacy program binaries with novel CFI policies designed by taking into account the lessons we learned and summarized in previous chapters. These lessons have helped us to design and implement a tool which can effectively protect forward and backward program CFG edges in stripped program binaries as this type of information is usually not required in production-ready binaries. Note that most of the semantic information has vanished through the compilation process. In this way, CRAs which rely on corrupting forward and/or backward CFG edges due to indirect control flow transfer violations are mitigated by greatly reducing the likelihood of successfully performing such an attack when hardening the program binary with τCFI.

Armor Within: Defending Against Vulnerabilities in Third-Party Libraries

2020 IEEE Security and Privacy Workshops (SPW)

Vulnerabilities in third-party software modules have resulted in severe security flaws, including remote code execution and denial of service. However, current approaches to securing such libraries suffer from one of two problems. First, they do not perform sufficiently well to be applicable in practice and incur high CPU and memory overheads. Second, they are also harder to apply to legacy and proprietary systems when the source code of the application is not available. There is, therefore, a dire need to secure the internal boundaries within an application to ensure vulnerable software modules are not exploitable via crafted input attacks. We present a novel approach to secure third-party software modules without requiring access to the source code of the program. First, using the foundations of language-theoretic security, we build a validation filter for the vulnerable module. Using the foundations of linking and loading, we present two different ways to insert that filter between the main code and the vulnerable module. Finally, using the foundations of ELFbased access control, we ensure any entry into the vulnerable module must first go through the filter. We evaluate our approaches using three known real-world exploits in two popular libraries-libpng and libxml. We were able to successfully prevent all three exploits from executing.