Complex instruction and software library mapping for embedded software using symbolic algebra (original) (raw)

The use of compiler optimizations for embedded systems software

Crossroads, 2008

Optimizing embedded applications using a compiler can generally be broken down into two major categories: hand-optimizing code to take advantage of a particular processor's compiler and applying built-in optimization options to proven and well-polished code. The former is well documented for different processors, but little has been done to find generalized methods for optimal sets of compiler options based on common goal criteria such as application code size, execution speed, power consumption, and build time. This article discusses the fundamental differences between these two general categories of optimizations using the compiler. Examples of common, built-in compiler options are presented using a simulated ARM processor and C compiler, along with a simple methodology that can be applied to any embedded compiler for finding an optimal set of compiler options.

Address register-oriented optimizations for embedded processors

2001

Embedded systems consisting of the application program ROM, RAM, the embedded processor core, and any custom hardware on a single wafer are becoming increasingly common in application domains such as signal processing. Given the rapid deployment of these systems, programming on such systems has shifted from assembly language to high-level languages such as C, C++, and Java. The processors used in such systems are usually targeted toward specific application domains, e.g., digital signal processing (DSP). As a result, these embedded processors include application-specific instruction sets, complex and irregular data paths, etc., thereby rendering code generation for these processors difficult. In this paper, we present new code optimization techniques for embedded fixed point DSP processors which have limited on-chip program ROM and include indirect addressing modes using post-increment and decrement operations. We present a heuristic to reduce code size by taking advantage of these addressing modes. Our solution aims at improving the offset assignment produced by Liao et al.'s solution. It finds a layout of variables in RAM, so that it is possible to subsume explicit address register manipulation instructions into other instructions as a post-increment or post-decrement operation. Experimental results show the effectiveness of our solution. Next, we propose an algorithm that uses commutative transformations to change the access sequence and thereby reducing the code size. Some DSP cores allow for the post-increment or decrement value to be larger than one. For such processors, we also present an approach that is incremental and has some advantages over another proposed solution that requires the expensive generation of cliques.

Exact and approximate algorithms for the extension of embedded processor instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2000

In embedded computing, cost, power, and performance constraints call for the design of specialized processors, rather than for the use of the existing off-the-shelf solutions. While the design of these application-specific CPUs could be tackled from scratch, a cheaper and more effective option is that of extending the existing processors and toolchains. Extensibility is indeed a feature now offered in real designs, e.g., by processors such as Tensilica Xtensa [T. R. Halfhill, Microprocess Rep., 2003], ARC ARCtangent [T. R. Halfhill, Microprocess Rep., 2000], STMicroelectronics ST200 [P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Proc. 27th Annu. Int. Symp. Computer Architecture, 2000, p. 203], and MIPS CorExtend [T. R. Halfhill, Microprocess Rep., 2003

Instruction selection for embedded DSPs with complex instructions

1996

Abstract{We address the problem of instruction selection in code generation for embedded digital signal processors. Recent work has shown that this task can be efciently solved b y t r e e c overing with dynamic programming, even in combination with the task of register allocation. However, performing instruction selection by tree c overing only does not exploit available instructionlevel parallelism, for instance in form of multiplyaccumulate instructions or parallel data moves. In this paper we investigate how such complex instructions may aect detection of optimal tree c overs, and we present a two-phase scheme for instruction selection which exploits available instruction-level parallelism. At the expense of higher compilation time, this technique may signicantly increase the code quality compared t o p r evious work, which is demonstrated for a widespread DSP.

Code Optimization Techniques for Embedded DSP Microprocessors

32nd Design Automation Conference, 1995

We address the problem of code optimization for embedded DSP microprocessors. Such processors (e.g., those in the TMS320 series) have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. In this paper we formulate and solve some optimization problems that arise in code generation for processors with irregular datapaths. In addition to instruction scheduling and register allocation, we also formulate the accumulator spilling and mode selection problems that arise in DSP microprocessors. We present optimal and heuristic algorithms that determine an instruction schedule simultaneously optimizing accumulator spilling and mode selection. Experimental results are presented.

Facilitating compiler optimizations through the dynamic mapping of alternate register structures

Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems - CASES '07, 2007

Aggressive compiler optimizations such as software pipelining and loop invariant code motion can significantly improve application performance, but these transformations often require the use of several additional registers to hold data values across one or more loop iterations. Compilers that target embedded systems may often have difficulty exploiting these optimizations since many embedded systems typically do not have as many general purpose registers available. Alternate register structures like register queues can be used to facilitate the application of these optimizations due to common reference patterns. In this paper, we propose a microarchitectural technique that permits these alternate register structures to be efficiently mapped into a given processor architecture and automatically exploited by an optimizing compiler. We show that this minimally invasive technique can be used to facilitate the application of software pipelining and loop invariant code motion for a variety of embedded benchmarks. This leads to performance improvements for the embedded processor, as well as new opportunities for further aggressive optimization of embedded systems software due to a significant decrease in the register pressure of tight loops.

A Method To Derive Application-Specific Embedded

2002

The concept of system-on-a-chip is becoming increasingly popular for the integration of complex systems. New types of processor cores are now available that enable the designer to customize their processors for the target applications. These soft cores are not tightly coupled with the target application, and this leads to processing cores sub-optimal for their specific applications. This paper proposes a method to derive applicationspecific embedded processors from soft processor cores. The derivation process involves an analysis of the resources of the processing core used by the target application. Then a series of optimizations based on the analysis results are performed on an optimizable model of the processor core. We present the tool used to perform the analysis of the resources used by an application, and results from a real-world case. Then, various optimization methods are described.