Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures (original) (raw)
Related papers
Flexible compiler-managed L0 buffers for clustered VLIW processors
… of the 36th annual IEEE/ACM …, 2003
Wire delays are a major concern for current and forthcoming pro-cessors. One approach to attack this problem is to divide the pro-cessor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while ...
Improving multithreading performance for clustered VLIW architectures
Very Long Instruction Word (VLIW) processors are very popular in embedded and mobile computing domain. Use of VLIW processors range from Digital Signal Processors (DSPs) found in a plethora of communication and multimedia devices to Graphics Processing Units (GPUs) used in gaming and high performance computing devices. The advantage of VLIWs is their low complexity and low power design which enable high performance at a low cost. Scalability of VLIWs is limited by the scalability of register file ports. It is not viable to have a VLIW processor with a single large register file because of area and power consumption implications of the register file. Clustered VLIW solve the register file scalability issue by partitioning the register file into multiple clusters and a set of functional units that are attached to register file of that cluster. Using a clustered approach, higher issue width can be achieved while keeping the cost of register file within reasonable limits. Several commer...
Using Queues for Register File Organization in VLIW Architectures by Marcio
1997
Software pipelining is an eeective technique for increasing the throughput of loops in superscalar or VLIW machines. However, software pipelining generates high register pressure, which in some cases requires the introduction of spill code into the schedule. This report shows that large multi-ported register les present signiic-ant problems in the construction of scalable VLIW systems. In an attempt to address this problem we are investigating the possibilities for VLIW architectures in which part of the register le is replaced by queues. We believe that this organization has distinct advantages in terms of hardware complexity, silicon area, instruction name space, and scalability. Queues also represent a natural mechanism for communication between clusters of functional units in a partitioned VLIW system. In this report we present an experimental evaluation of the machine resources required to support modulo scheduling under a variety of VLIW register le conngurations. The results ...