Binary rewriting without relocation information (original) (raw)

Static binary rewriting without supplemental information: Overcoming the tradeoff between coverage and correctness

2013 20th Working Conference on Reverse Engineering (WCRE), 2013

Binary rewriting is the process of transforming executables by maintaining the original binary's functionality, while improving it in one or more metrics, such as energy use, memory use, security, or reliability. Although several technologies for rewriting binaries exist, static rewriting allows for arbitrarily complex transformations to be performed. Other technologies, such as dynamic or minimally-invasive rewriting, are limited in their transformation ability. We have designed the first static binary rewriter that guarantees 100% code coverage without the need for relocation or symbolic information. A key challenge in static rewriting is content classification (i.e. deciding what portion of the code segment is code versus data). Our contributions are (i) handling portions of the code segment with uncertain classification by using speculative disassembly in case it was code, and retaining the original binary in case it was data; (ii) drastically limiting the number of possible speculative sequences using a new technique called binary characterization; and (iii) avoiding the need for relocation or symbolic information by using call translation at usage points of code pointers (i.e. indirect control transfers), rather than changing addresses at address creation points. Extensive evaluation using stripped binaries for the entire SPEC 2006 benchmark suite (with over 1.9 million lines of code) demonstrates the robustness of the scheme.

SaBRe: Load-time selective binary rewriting

2020

Binary rewriting is a technique that consists in disassembling a program to modify its instructions, with many applications, e.g. monitoring, debugging, reverse engineering and reliability. However, existing solutions suffer from well-known shortcomings in terms of soundness, performance and usability. We present SaBRe, a novel load-time framework for selective binary rewriting. SaBRe rewrites specific constructs of interest — mainly system calls and function prologues — when the program is loaded into memory. This enables users to intercept those constructs at runtime via a modular architecture allowing custom plugins to be linked with SaBRe using a simple and flexible API. We also discuss the theoretical underpinnings of disassembling and rewriting, including conditions for coverage, accuracy, and correctness; and how they affect SaBRe. We developed two backends for SaBRe — one for x86_64 and one for RISC-V — which were in turn used to implement two open-source plugins: a fast sys...

Link-time binary rewriting techniques for program compaction

ACM Transactions on Programming Languages and Systems, 2005

Small program size is an important requirement for embedded systems with limited amounts of memory. We describe how link-time compaction through binary rewriting can achieve code size reductions of up to 62% for statically bound languages such as C, C++, and Fortran, without compromising on performance. We demonstrate how the limited amount of information about a program at link time can be exploited to overcome overhead resulting from separate compilation. This is done with scalable, cost-effective, whole-program analyses, optimizations, and duplicate code and data elimination techniques. The discussed techniques are evaluated and their cost-effectiveness is quantified with SQUEEZE++, a prototype link-time compactor.

Binary refactoring

Proceedings of the 27th international conference on Software engineering - ICSE '05, 2005

We present Binary Refactoring: a software engineering technique for improving the implementation of programs without modifying their source code. While related to regular refactoring in preserving a program's functionality, binary refactoring aims to capture modifications that are often applied to source code, although they only improve the performance of the software application and not the code structure. We motivate binary

Binary Recompilation via Dynamic Analysis and the Protection of Control and Data-flows Therein

2020

Author(s): Nash, Joseph Michael | Advisor(s): Franz, Michael | Abstract: Legacy binaries need to continue functioning even when no source code has been preserved, to support the workflows of government and industry. The binaries often lack recent improvements in compiler design and software engineering practices, causing them to be slower and less secure than modern binaries. Binary rewriting seeks to patch, optimize, instrument, or harden binaries to bridge this gap, but existing practice is limited by the underlying static analysis. We created a framework, BinRec, to use dynamic analysis to lift binaries to LLVM IR then recompile them, which overcomes the limitations of static analysis.The protection of software against memory corruption exploits has a rich history, which this thesis both systematizes and extends. We present a study of the performance, precision, and security of control-flow integrity (CFI). Data-only attacks can bypass CFI, and so we present a defense against the...

Decompilation to Compiler High IR in a binary rewriter

ABSTRACT A binary rewriter is a piece of software that accepts a binary executable program as input, and produces an improved executable as output. This paper describes the first technique in literature to decompile the input binary into an existing compiler's high-level intermediate form (IR). The compiler's back-end is then used to generate the output binary from the IR. Doing so enables the use of the rich set of compiler analysis and transformation passes available in mature compilers.

DIABLO: a reliable, retargetable and extensible link-time rewriting framework

Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005., 2005

Modern software engineering techniques introduce an overhead to programs in terms of performance and code size. A traditional development environment, where only the compiler optimizes the code, cannot completely eliminate this overhead. To effectively remove the overhead, tools are needed that have a whole-program overview. Link-time binary rewriting is an effective technique for whole-program optimization and instrumentation. In this paper we describe a novel framework to reliably perform link-time program transformations. This framework is designed to be retargetable, supporting multiple architectures and development toolchains. Furthermore it is extensible, which we illustrate by describing three different applications that are built on top of the framework.

On the Effectiveness of Source Code Transformations for Binary Obfuscation

2006

Obfuscation is gaining momentum as a protection mechanism for the intellectual property contained within or encapsulated by software. Usually, one of the following three directions is followed: source code obfuscation is achieved through source code transformations, Java bytecode obfuscation through transformations on the bytecode, and binary obfuscation through binary rewriting. In this paper, we study the effectiveness of source code transformations for binary obfuscation. The transformations applied by several existing source code obfuscators are empirically shown to have no impact on the stripped binary after compilation. Subsequently, we study which source code transformations are robust enough to percolate through the compiler into the binary.

SIND: A framework for binary translation

2001

Recent work with dynamic optimization in platform independent, virtual machine based languages such as Java has sparked interest in the possibility of applying similar techniques to arbitrary compiled binary programs. Systems such as Dynamo, DAISY, and FX¢ 32 exploit dynamic optimization techniques to improve performance of native or foreign architecture binaries. However, research in this area is complicated by the lack of openly licensed, freely available, and platform-independent experimental frameworks. SIND aims to fill this void by providing a easily-extensible and flexible framework for research and development of applications and techniques of binary translation. Current research focuses are dynamic optimization of running binaries and dynamic security augmentation and integrity assurance.

Rewrite to Reinforce: Rewriting the Binary to Apply Countermeasures against Fault Injection

2021 58th ACM/IEEE Design Automation Conference (DAC), 2021

Fault injection attacks can cause errors in software for malicious purposes. Oftentimes, vulnerable points of a program are detected after its development. It is therefore critical for the user of the program to be able to apply last-minute security assurance to the executable file without having access to the source code. In this work, we explore two methodologies based on binary rewriting that aid in injecting countermeasures in the binary file. The first approach injects countermeasures by reassembling the disassembly whereas the second approach leverages a full translation to a high-level IR and lowering that back to the target architecture.

A lightweight framework for function name reassignment based on large-scale stripped binaries

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Software in the wild is usually released as stripped binaries that contain no debug information (e.g., function names). This paper studies the issue of reassigning descriptive names for functions to help facilitate reverse engineering. Since the essence of this issue is a data-driven prediction task, persuasive research should be based on sufficiently large-scale and diverse data. However, prior studies can only be based on small-scale datasets because their techniques suffer from heavyweight binary analysis, making them powerless in the face of big-size and large-scale binaries. This paper presents the Neural Function Rename Engine (NFRE), a lightweight framework for function name reassignment that utilizes both sequential and structural information of assembly code. NFRE uses fine-grained and easily acquired features to model assembly code, making it more effective and efficient than existing techniques. In addition, we construct a large-scale dataset and present two data-preprocessing approaches to help improve its usability. Benefiting from the lightweight design, NFRE can be efficiently trained on the large-scale dataset, thereby having better generalization capability for unknown functions. The comparative experiments show that NFRE outperforms two existing techniques by a relative improvement of 32% and 16%, respectively, while the time cost for binary analysis is much less.

Exploiting Code Mobility for Dynamic Binary Obfuscation

Software protection aims at protecting the integrity of software applications deployed on un-trusted hosts and being subject to illegal analysis. Within an un-trusted environment a possibly malicious user has complete access to system resources and tools in order to analyze and tamper with the application code. To address this research problem, we propose a novel binary obfuscation approach based on the deployment of an incomplete application whose code arrives from a trusted network entity as a flow of mobile code blocks which are arranged in memory with a different customized memory layout. This paper presents our approach to contrast reverse engineering by defeating static and dynamic analysis, and discusses its effectiveness.

Retrofitting security in cots software with binary rewriting

2011

We present a practical tool for inserting security features against low-level software attacks into third-party, proprietary or otherwise binary-only software. We are motivated by the inability of software users to select and use low-overhead protection schemes when source code is unavailable to them, by the lack of information as to what (if any) security mechanisms software producers have used in their toolchains, and the high overhead and inaccuracy of solutions that treat software as a black box.

Binary Code Continent: Finer-Grained Control Flow Integrity for Stripped Binaries

Control Flow Integrity (CFI) is an effective technique to mitigate threats such as code-injection and code-reuse attacks in programs by protecting indirect transfers. For stripped binaries, a CFI policy has to be made conservatively due to the lack of source code level semantics. Existing binary-only CFI solutions such as BinCFI and CCFIR demonstrate the ability to protect stripped binaries, but the policies they apply are too permissive, allowing sophisticated code-reuse attacks. In this paper, we propose a new binary-only CFI protection scheme called BinCC, which applies static binary rewriting to provide finer-grained protection for x86 stripped ELF binaries. Through code duplication and static analysis, we divide the binary code into several mutually exclusive code continents. We further classify each indirect transfer within a code continent as either an Intra-Continent transfer or an Inter-Continent transfer, and apply separate, strict CFI polices to constrain these transfers. To evaluate BinCC, we introduce new metrics to estimate the average amount of legitimate targets of each kind of indirect transfer as well as the difficulty to leverage call preceded gadgets to generate ROP exploits. Compared to the state of the art binary-only CFI, BinCFI, the experimental results show that BinCC significantly reduces the legitimate transfer targets by 81.34% and increases the difficulty for adversaries to bypass CFI restriction to launch sophisticated ROP attacks. Also, BinCC achieves a reasonable performance, around 14% of the space overhead decrease and only 4% runtime overhead increase as compared to BinCFI.

Binary translation: static, dynamic, retargetable?

Proceedings of International Conference on Software Maintenance ICSM-96, 1996

The porting of software to newer and faster machines using static binary translation techniques has proved successful to a large extent. Current binary translators are static in nature and require a runtime environment to successfully support the execution of the translated programs on the new machine. On the other hand, dynamic binary translation has not been considered as an alternative to static translation { we argue that these translators can achieve at least the same performance as static translators but will require a simpler runtime environment.

Disassembly of Executable Code Revisited

2002

Machine code disassembly routines form a fundamental component of software systems that statically analyze or modify executable programs. The task of disassembly is complicated by indirect jumps and the presence of nonexecutable data-jump tables, alignment bytes, etc.-in the instruction stream. Existing disassembly algorithms are not always able to cope successfully with executable files containing such features and fail silently-i.e., produce incorrect disassemblies without any indication that the results they are producing are incorrect. This can be a serious problem, since it can compromise the correctness of a binary rewriting tool. In this paper we examine two commonlyused disassembly algorithms and illustrate their shortcomings. We propose a hybrid approach that performs better than these algorithms in the sense that it is able to detect situations where the disassembly may be incorrect and limit the extent of such disassembly errors. Experimental results indicate that the algorithm is quite effective: the amount of code flagged as incurring disassembly errors is usually quite small.

An approach to the problem of detranslation of computer programs

1980

A crucial problem in the decompilation or disassembly of computer programs is the identification of executable code, i.e. the separation of instructions from data. This problem, for most computer architectures, is equivalent to the Halting Problem and is therefore unsolvable in general. The problem of identifying instructions is examined in this paper and an algorithm that will 'usually' perform a correct analysis is described. In addition, other aspects of disassembly are discussed and algorithms outlined.

08441 Final Report – Emerging Uses and Paradigms for Dynamic Binary Translation

2009

Software designers and developers face many problems in designing, building, deploying, and maintaining cutting-edge software applications–reliability,security,performance,power,legacy code,use of multi-core platforms,and maintenance are just a few of the issues that must be considered. Many of these issues are fundamental parts of the grand challenges in computer science such as reliability and security.

Large-scale Debloating of Binary Shared Libraries

Digital Threats: Research and Practice

Developers nowadays have access to an arsenal of toolkits and libraries for rapid application prototyping. However, when an application loads a library, the entirety of that library’s code is mapped into the process address space, even if only a single function is actually needed. The unused portion is bloat that can negatively impact software defenses by unnecessarily inflating their overhead or increasing the attack surface. In this article, we investigate whether debloating is possible and practical at the binary level. To this end, we present Nibbler : a system that identifies and erases unused functions within dynamic shared libraries. Nibbler works in tandem with defenses such as continuous code re-randomization and control-flow integrity, enhancing them without incurring additional run-time overhead. We developed and tested a prototype of Nibbler on x86-64 Linux; Nibbler reduces the size of shared libraries and the number of available functions, for real-world binaries and th...