Fast instruction set customization (original) (raw)
Related papers
2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC), 2010
Custom instruction set extensions (ISEs) are added to an extensible base processor to provide application-specific functionality at a low cost. As only one ISE executes at a time, resources can be shared. This paper presents a new high-level synthesis flow targeting ISEs. We emphasize a new technique for resource allocation, binding, and port assignment during synthesis. Our method is derived from prior work on datapath merging, and increases area reduction by accounting for the cost of multiplexors that must be inserted into the resulting datapath to achieve multi-operational functionality.
An architecture framework for transparent instruction set customization in embedded processors
2005
Abstract Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While custom instructions can be effective, the time and cost of designing a new processor for each application is immense.
The Journal of Supercomputing, 2012
Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that
Fast instruction set custornization
2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004., 2004
This paper proposes an approach to tune embedded processor datapaths toward a specific application, so as to maximize the application performance. We customize the computation capabilities of a base processor, by extending its instruction set to include custom operations which are implemented as new specialized functional units. We describe an automatic methodology to select the custom instructions from the given application code, in a way that there is no need of compensation code or other modifications in the application, simplifying the code generation. By using the ArchC architecture description language, fast compilation and simulation of the resulting customized processor code are achieved, considerably reducing the turnaround time required to evaluate the best set of custom operations. Experimental results show that our framework provides large performance improvements (up to 3.6 times), when compared to the base general-purpose processor, while significantly speeding up the design process.
2004
Abstract Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new instructions that are executed on specialized function units. Collapsing the subgraphs simultaneously reduces the length of computation as well as the number of intermediate results stored in the register file. The main problem with this approach is that a new processor must be generated for each application domain.
A flexible instruction generation framework for extending embedded processors
2006
Modern platform-based design involves the domainspecific extension of embedded processors to fit customer requirements. To accomplish this task, the possibilities offered by recent extensible processors for their instruction set and microarchitectural customization have to be exploited. In this paper, a design approach that encapsulates automated workload characterization and highly-controllable instruction and application-specific functional unit (AFU) generation is utilized for fast extension space exploration of embedded processors. It is proved that a relatively small number of unique AFUs is needed in order to support embedded applications from the MiBench and Powerstone suites. It is possible to achieve 1.8× to 6.8× performance improvements although further possibilities such as subword parallelization are not currently regarded.
Automatic instruction set extension and utilization for embedded processors
Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003, 2003
There is a growing demand for application-specific embedded processors in system-on-a-chip designs. Current tools and design methodologies often require designers to manually specialize the processor based on an application. Moreover, the use of the new complex instructions added to the processor is often left to designers' ingenuity. In this paper, we present a solution that automatically groups dataflow operations in the application software as potential new complex instructions. The set of possible instructions is then automatically used for code generation combined with high-level arithmetic optimizations using symbolic algebra. Symbolic arithmetic manipulations provide a novel and effective method for instruction selection that is necessary due to the complexity of the automatically identified instructions. We have used our methodology to automatically add new instructions to Tensilica processors for a set of examples. Our results show that our tools improve designers productivity and efficiently specialize an embedded processor for the given application such that the execution time is greatly improved.
Lecture Notes in Computer Science, 2009
Instruction-Set Extensions (ISEs) have gained prominence in the past few years as a useful method for tailoring the ISAs (Instruction-Set Architectures) of ASIPs (Application Specific Instruction-Set Processors) to the computational requirements of various embedded applications. This work presents a generic and easily adaptable flow for application oriented ISE design that supports both of the prevalent ASIP design paradigmscomplete ISA design from scratch through an extensive design-space exploration, or limited ISA adaptation for a pre-designed and pre-verified base-processor core. The broad applicability of this design flow is demonstrated using ISA customization case studies for both of these two design philosophies.
Processor acceleration through automated instruction set customization
2003
Abstract Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meetthe growing performance and power demands of embeddedapplications. Hardware, in the form of new function units (or co-processors), and the corresponding instructions, areadded to a baseline processor to meet the critical computational demands of a target application.
Proceedings of the Design Automation & Test in Europe Conference, 2006
Design tools for application specific instruction set processors (ASIPs) are an important discipline in systemlevel design for wireless communications and other embedded application areas. Some ASIPs are still designed completely from scratch to meet extreme efficiency demands. However, there is also a trend towards use of partially predefined, configurable RISC-like embedded processor cores that can be quickly tuned to given applications by means of instruction set extension (ISE) techniques. While the problem of optimized ISE synthesis has been studied well from a theoretical perspective, there are still few approaches to an overall HW/SW design flow for configurable cores that take all real-life constraints into account. In this paper, we therefore present a novel procedure for automated ISE synthesis that accommodates both user-specified and processor-specific constraints in a flexible way and that produces valid, optimized ISE solutions in short time. Driven by an advanced application C code analysis/profiling frontend, the ISE synthesis core algorithm is embedded into a complete design flow, where the backend is formed by a state-of-the-art industrial tool for processor configuration, ISE HW synthesis, and SW tool retargeting. The proposed design flow, including ISE synthesis, is demonstrated via several benchmarks for the MIPS CorExtend configurable RISC processor platform.