Application-specific instruction generation for configurable processor architectures (original) (raw)
Related papers
Architecture-Aware Custom Instruction Generation for Reconfigurable Processors
… Architectures, Tools and …, 2010
Instruction set extension is becoming extremely popular for meeting the tight design constraints in embedded systems. This mechanism is now widely supported by commercially available FPGA (Field-Programmable Gate Array) based reconfigurable processors. In this paper, we present a design flow that automatically enumerates and selects custom instructions from an application DFG (Data-Flow Graph) in an architecture-aware manner. Unlike previously reported methods, the proposed enumeration approach identifies custom instruction patterns that can be mapped onto the target FPGA in a predictable manner. Our investigation shows that using this strategy the selection process can make a more informed decision for selecting a set of custom instructions that will lead to higher performance at lower cost. Experimental results based on six applications from a widely-used benchmark suite show that the proposed design flow can achieve significantly higher performance gain when compared to conventional design approaches.
Proceedings of the Design Automation & Test in Europe Conference, 2006
Design tools for application specific instruction set processors (ASIPs) are an important discipline in systemlevel design for wireless communications and other embedded application areas. Some ASIPs are still designed completely from scratch to meet extreme efficiency demands. However, there is also a trend towards use of partially predefined, configurable RISC-like embedded processor cores that can be quickly tuned to given applications by means of instruction set extension (ISE) techniques. While the problem of optimized ISE synthesis has been studied well from a theoretical perspective, there are still few approaches to an overall HW/SW design flow for configurable cores that take all real-life constraints into account. In this paper, we therefore present a novel procedure for automated ISE synthesis that accommodates both user-specified and processor-specific constraints in a flexible way and that produces valid, optimized ISE solutions in short time. Driven by an advanced application C code analysis/profiling frontend, the ISE synthesis core algorithm is embedded into a complete design flow, where the backend is formed by a state-of-the-art industrial tool for processor configuration, ISE HW synthesis, and SW tool retargeting. The proposed design flow, including ISE synthesis, is demonstrated via several benchmarks for the MIPS CorExtend configurable RISC processor platform.
Rapid design of area-efficient custom instructions for reconfigurable embedded processing
Journal of Systems Architecture, 2009
RISPs (Reconfigurable Instruction Set Processors) are increasingly becoming popular as they can be customized to meet design constraints. However, existing instruction set customization methodologies do not lend well for mapping custom instructions on to commercial FPGA architectures. In this paper, we propose a design exploration framework that provides for rapid identification of a reduced set of profitable custom instructions and their area costs on commercial architectures without the need for time consuming hardware synthesis process. A novel clustering strategy is used to estimate the utilization of the LUT (Look-Up Table) based FPGAs for the chosen custom instructions. Our investigations show that the area costs computations using the proposed hardware estimation technique on 20 custom instructions are shown to be within 8% of those obtained using hardware synthesis. A systematic approach has been adopted to select the most profitable custom instruction candidates. Our investigations show that this leads to notable reduction in the number of custom instructions with only marginal degradation in performance. Simulations based on domain-specific application sets from the MiBench and MediaBench benchmark suites show that on average, more than 25% area utilization efficiency (performance/area) can be achieved with the proposed technique.
The Journal of Supercomputing, 2012
Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that
Architecture and Compiler Optimizations for Data Bandwidth Improvement in Configurable Processors
IEEE Transactions on Very Large Scale Integration Systems, 2006
Many commercially available embedded processors are capable of extending their base instruction set for a specific domain of applications. While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a potential performance bottleneck.
Architecture and compilation for data bandwidth improvement in configurable embedded processors
2005
Many commercially available embedded processors are capable of extending their base instruction set for a specific domain of applications. While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, recent study has shown that the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a serious performance bottleneck.
Fast instruction set customization
2004
This paper proposes an approach to tune embedded processor datapaths toward a specific application, so as to maximize the application performance. We customize the computation capabilities of a base processor, by extending its instruction set to include custom operations which are implemented as new specialized functional units. We describe an automatic methodology to select the custom instructions from the given application code, in a way that there is no need of compensation code or other modifications in the application, simplifying the code generation. By using the ArchC architecture description language, fast compilation and simulation of the resulting customized processor code are achieved, considerably reducing the turnaround time required to evaluate the best set of custom operations. Experimental results show that our framework provides large performance improvements (up to 3.6 times), when compared to the base general-purpose processor, while significantly speeding up the design process.
Efficient ASIP design for configurable processors with fine-grained resource sharing
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, 2008
Application-Specific Instruction-set Processors (ASIP) can improve execution speed by using custom instructions. Several ASIP design automation flows have been proposed recently. In this paper, we investigate two techniques to improve these flows, so that ASIP can be efficiently applied to simple computer architectures in embedded applications. Firstly, we efficiently generate custom instructions with multi-cycle IO (which allows multi-outputs), thus removing the constraint imposed by the ports of the register file. Secondly, we allow identical portions of different custom instructions to be shared, thus allowing more custom instructions under the same area constraint. To handle the greatly increased exploration space, we propose several heuristics to keep the problem tractable. Experimental results show that we can achieve 3x speedup in some cases.
Tradeoffs in the Design Space Exploration of Application-Specific Processors
2003
An application-specific instruction set processor (ASIP) design methodology can exploit special characteristics of applications to meet the performance and time-to-market requirements. In this paper, tradeoffs encountered in the design of application-specific processors targeting embedded applications are discussed. The exploration of the architecture design space is a crucial step to effective instruction set generation as well as micro-architecture design. In this context, we identify operation pattern matching and hardware module allocation possibilities that should be accounted in hardware generation frameworks. For this reason, such modifications are applied on a pipelined processor model and their effect on execution performance and energy dissipation is measured via cycle-accurate simulations. As proof-of-concept, a real application, namely the full-search motion estimation algorithm used in video coding, is investigated.
Exploring opportunities to improve the performance of a reconfigurable instruction set processor
International Journal of Electronics, 2007
In this paper, we target at a Reconfigurable Instruction Set Processor (RISP), which tightly couples a coarse-grain Reconfigurable Functional Unit (RFU) to a RISC processor. Furthermore, the architecture is supported by a flexible development framework. By allowing the definition of alternate architectural parameters the framework can be used to explore the design space and fine-tune the architecture at design time. Initially, two architectural enhancements, namely partial predicated execution and virtual opcode are proposed and the extensions performed in the architecture and the framework to support them, are presented. To evaluate these issues kernels from the multimedia domain are considered and an exploration to derive an appropriate instance of the architecture is performed. The efficiency of the derived instance and the proposed enhancements are evaluated using an MPEG-2 encoder application.