A Retargetable Static Binary Translator for the ARM Architecture (original) (raw)

Dynamic binary translation and optimization

IEEE Transactions on Computers, 2001

We describe a VLIW architecture designed speci cally as a target for dynamic compilation of an existing instruction set architecture. This design approach o ers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and makes use of dynamic adaptation. Thus, the original architecture is implemented using dynamic compilation, a process we refer to as DAISY Dynamically Architected Instruction Set from Yorktown. The dynamic compiler exploits runtime pro le information to optimize translations so as to extract instruction level parallelism. This work reports di erent design trade-o s in the DAISY system, and their impact on nal system performance. The results show high degrees of instruction parallelism with reasonable translation overhead and memory usage.

Binary translation: static, dynamic, retargetable?

Proceedings of International Conference on Software Maintenance ICSM-96, 1996

The porting of software to newer and faster machines using static binary translation techniques has proved successful to a large extent. Current binary translators are static in nature and require a runtime environment to successfully support the execution of the translated programs on the new machine. On the other hand, dynamic binary translation has not been considered as an alternative to static translation { we argue that these translators can achieve at least the same performance as static translators but will require a simpler runtime environment.

Memory optimization of dynamic binary translators for embedded systems

ACM Transactions on Architecture and Code Optimization, 2012

Dynamic binary translators (DBTs) are becoming increasingly important because of their power and flexibility. DBT-based services are valuable for all types of platforms. However, the high memory demands of DBTs present an obstacle for embedded systems. Most research on DBT design has a performance focus, which often drives up the DBT memory demand. In this article, we present a memory-oriented approach to DBT design. We consider the class of translation-based DBTs and their sources of memory demand; cached translated code, cached auxiliary code and DBT data structures. We explore aspects of DBT design that impact these memory demand sources and present strategies to mitigate memory demand. We also explore performance optimizations for DBTs that handle memory demand by placing a limit on it, and repeatedly flush translations to stay within the limit, thereby replacing the memory demand problem with a performance degradation problem. Our optimizations that mitigate memory demand improve performance.

Optimising dynamic binary modification across 64-bit Arm microarchitectures

Proceedings of the 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2020

A common optimisation used in most Dynamic Binary Modification (DBM) systems is trace generation as these traces improve locality and code layout. We describe an optimised code layout for traces as well as present how to adapt the runtime algorithm to generate it. In this way, we manage to reduce the overhead on all the Arm systems evaluated; 5 different microarchitectures. A major source of overhead for DBMs comes from handling indirect branches. Indirect Branch Inlining (IBI) is a mechanism that attempts to avoid this overhead by using predictions about the target of the indirect branch. We analyse the behaviour of the indirect branch inlining and propose a new predictor, Trace Restricted IBI (TRIBI), and how to optimise IBI given the new trace generation algorithm. Our evaluation shows a geometric mean overhead for SPEC CPU2006 of 9% for a Cortex-A53 (in-order core), and for out-of-order cores 11% on an X-Gene-2, 10% on a Cortex-A57, 7% on a Cortex-A72 and 8% on a Cortex-A73, whe...

SIND: A framework for binary translation

2001

Recent work with dynamic optimization in platform independent, virtual machine based languages such as Java has sparked interest in the possibility of applying similar techniques to arbitrary compiled binary programs. Systems such as Dynamo, DAISY, and FX¢ 32 exploit dynamic optimization techniques to improve performance of native or foreign architecture binaries. However, research in this area is complicated by the lack of openly licensed, freely available, and platform-independent experimental frameworks. SIND aims to fill this void by providing a easily-extensible and flexible framework for research and development of applications and techniques of binary translation. Current research focuses are dynamic optimization of running binaries and dynamic security augmentation and integrity assurance.

BOA: The architecture of a binary translation processor

2000

High frequency design and instruction-level parallelism (ILP) are two keys to high performance microprocessor implementation. To achieve these sometimes competing goals, the Binary-translation Optimized Architecture (BOA) aims to bring code translation techniques based on continuous profiling into the mainstream. Initially, code is interpreted to detect code hot spots and gather profile information to guide dynamic optimizations. To achieve compatibility with the established Pow-erPC architecture, a binary translation layer translates PowerPC instructions into simple VLIW operation primitives. These primitives are then scheduled using VLIW scheduling techniques to a variable length, six issue VLIW/EPIC processor. Binary translation eliminates the binary compatibility problem faced by other processors, while dynamic recompilation enables adaptive re-optimization of critical program code sections and eliminates the need for dynamic scheduling hardware.

Dynamic reconfiguration with binary translation

Proceedings of the 42nd annual conference on Design automation - DAC '05, 2005

In this paper we present the impact of dynamically translating any sequence of instructions into combinational logic. The proposed approach combines a reconfigurable architecture with a binary translation mechanism, being totally transparent for the software designer. Besides ensuring software compatibility, the technique allows porting the same code for different machines tracking technological evolutions. The target processor is a Java machine able to execute Java bytecodes. Experimental results show that even code without any available parallelism can benefit from the proposed approach. Algorithms used in the embedded systems domain were accelerated 4.6 times in the mean, while spending 10.89 times less energy in the average. We present results regarding the impact of area and power, and compare the proposed approach with other Java machines, including a VLIW one.

Retargetable and reconfigurable software dynamic translation

International Symposium on Code Generation and Optimization, 2003. CGO 2003.

Software dynamic translation (SDT) is a technology that permits the modification of an executing program's instructions. In recent years, SDT has received increased attention, from both industry and academia, as a feasible and effective approach to solving a variety of significant problems. Despite this increased attention, the task of initiating a new project in software dynamic translation remains a difficult one. To address this concern, and in particular, to promote the adoption of SDT technology into an even wider range of applications, we have implemented Strata, a cross-platform infrastructure for building software dynamic translators. This paper describes Strata's architecture, our experience retargeting it to three different processors, and our use of Strata to build two novel SDT systemsone for safe execution of untrusted binaries and one for fast prototyping of architectural simulators.

A Retargetable Static Binary Translator for the ARM Architecture (original) (raw)

Related papers