A Generic Design Flow for Application Specific Processor Customization through Instruction-Set Extensions (ISEs) (original) (raw)

A design flow for configurable embedded processors based on optimized instruction set extension synthesis

Proceedings of the Design Automation & Test in Europe Conference, 2006

Design tools for application specific instruction set processors (ASIPs) are an important discipline in systemlevel design for wireless communications and other embedded application areas. Some ASIPs are still designed completely from scratch to meet extreme efficiency demands. However, there is also a trend towards use of partially predefined, configurable RISC-like embedded processor cores that can be quickly tuned to given applications by means of instruction set extension (ISE) techniques. While the problem of optimized ISE synthesis has been studied well from a theoretical perspective, there are still few approaches to an overall HW/SW design flow for configurable cores that take all real-life constraints into account. In this paper, we therefore present a novel procedure for automated ISE synthesis that accommodates both user-specified and processor-specific constraints in a flexible way and that produces valid, optimized ISE solutions in short time. Driven by an advanced application C code analysis/profiling frontend, the ISE synthesis core algorithm is embedded into a complete design flow, where the backend is formed by a state-of-the-art industrial tool for processor configuration, ISE HW synthesis, and SW tool retargeting. The proposed design flow, including ISE synthesis, is demonstrated via several benchmarks for the MIPS CorExtend configurable RISC processor platform.

A high-level synthesis flow for custom instruction set extensions for application-specific processors

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC), 2010

Custom instruction set extensions (ISEs) are added to an extensible base processor to provide application-specific functionality at a low cost. As only one ISE executes at a time, resources can be shared. This paper presents a new high-level synthesis flow targeting ISEs. We emphasize a new technique for resource allocation, binding, and port assignment during synthesis. Our method is derived from prior work on datapath merging, and increases area reduction by accounting for the cost of multiplexors that must be inserted into the resulting datapath to achieve multi-operational functionality.

Hardware/software instruction set configurability for system-on-chip processors

Proceedings of the 38th …, 2001

New application-focused system-on-chip platforms motivate new application-specific processors. Configurable and extensible processor architectures offer the efficiency of tuned logic solutions with the flexibility of standard high-level programming methodology. Automated extension of processor function units and the associated software environmentcompilers, debuggers, simulators and real-time operating systems-satisfies these needs. At the same time, designing at the level of software and instruction set architecture significantly shortens the design cycle and reduces verification effort and risk. This paper describes the key dimensions of extensibility within the processor architecture, the instruction set extension description language and the means of automatically extending the software environment from that description. It also describes two groups of benchmarks, EEMBC's Consumer and Telecommunications suites, that show 20 to 40 times acceleration of a broad set of algorithms through application-specific instruction set extension, relative to high performance RISC processors.

An architecture framework for transparent instruction set customization in embedded processors

2005

Abstract Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While custom instructions can be effective, the time and cost of designing a new processor for each application is immense.

Fast instruction set customization

2004

This paper proposes an approach to tune embedded processor datapaths toward a specific application, so as to maximize the application performance. We customize the computation capabilities of a base processor, by extending its instruction set to include custom operations which are implemented as new specialized functional units. We describe an automatic methodology to select the custom instructions from the given application code, in a way that there is no need of compensation code or other modifications in the application, simplifying the code generation. By using the ArchC architecture description language, fast compilation and simulation of the resulting customized processor code are achieved, considerably reducing the turnaround time required to evaluate the best set of custom operations. Experimental results show that our framework provides large performance improvements (up to 3.6 times), when compared to the base general-purpose processor, while significantly speeding up the design process.

A Retargetable Tool-Suite for the Design of Application Specific Instruction Set Processors Using a Machine Description Language

2008

This paper presents BURAQ, a DSP development framework, which aims at optimizing cost, efficiency and turn around time of System-On-Chip development. BURAQ accepts an Instruction and Architecture description (IAD) file that represents the DSP and its instruction set at a higher level of abstraction, in a proprietary language. The system then synthesizes a complete hardware description of the processor core, along with accompanying tools i.e. ILP Assembler, Linker and Instruction Set Simulator. The synthesized processor core is composed of a processor kernel, registers, addressing units and functional units. A user friendly IDE for the above mentioned framework has also been developed and it allows easy specification and detailed analysis of the target architecture. Hence BURAQ allows a platform for hardware/software Co-Simulation of a DSP. Co-Simulation is a very powerful tool for early design space exploration and thus reducing production cost and development time of SOC architectu...

Compiler-directed Customization of ASIP Cores.

This paper presents an automatic method to customize embedded application-specific instruction processors (ASIPs) based on compiler analysis. ASIPs, also known as embedded soft cores, allow certain hardware parameters in the processor to be customized for a specific application domain. They offer low design cost as they use pre-designed and verified components. Our design goal is choosing parameter values for fastest runtime within a given silicon area budget for a particular application set. Present-day technologies for choosing parameter values rely on exhaustive simulation of the application set on all possible combinations of parameter values -a time-consuming and non-scalable procedure. We propose a compiler-based method that automatically derives the optimal values of parameters without simulating any configuration. Further, we expand the space of parameters that can be changed from the limited set today, and evaluate the importance of each. Results show that for our benchmarks, the runtimes for different configurations are predicted with an average error of 2.5%. In the two area constrained customization problem we evaluate, our method is able to recommend the same configuration that is recommended by brute force exhaustive simulation. 1

Application-specific processing on a general-purpose core via transparent instruction set customization

2004

Abstract Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new instructions that are executed on specialized function units. Collapsing the subgraphs simultaneously reduces the length of computation as well as the number of intermediate results stored in the register file. The main problem with this approach is that a new processor must be generated for each application domain.

Customizing Embedded Processors for Specific Applications

Complexity of embedded applications is growing rapidly. This growth is accompanied by severe implementation contraints on cost, size, performance as well as power. This has resulted in search for expanding the architectural de-sign space for implementing such complex applications. The last decade has seen the growth of Application specific instruction processors(ASIP) as an alternative to general purpose processors on one hand and Application specific integrated circuits (ASIC) on the other. ASIPs are considerably more flexible than ASICs while being more performance and power efficient than general purpose proces-sors. The key issue in designing the ASIPs relate to customization i.e. identify the critical application characteristics which need to be supported by special hard-ware to create an application specific processor. In this paper we address two specific research problems; storage customization of a RISC processor and de-sign of clustered VLIW processor. A simple RISC proces...

The ARISE Reconfigurable Instruction Set Extensions Framework

2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, 2007

In this paper, we introduce the ARISE framework for the systematic extension of typical processors with the necessary infrastructure to support arbitrary number and type of reconfigurable hardware units. ARISE extends the microarchitecture of the processor with an interface to allow the coupling of the hardware units. Furthermore, the instruction set of the processor is extended with instructions which expose to the programmer/compiler the full control of the interface. This control includes the configuration of operations on the hardware units, execution of these operations, and communication of data between the processor and the units. The new instructions are incorporated without the need to redesign the processor instruction set architecture. To evaluate our proposal a model of an ARISE extended MIPS processor has been designed. Using a turbodecoder algorithm as benchmarking application a simulation of the ARISE model has been performed. Performance results show impressive application speedups up to x7.5.