A Framework for Power-Gating Functional Units in Embedded Microprocessors (original) (raw)

2000, IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Power gating is a technique commonly used for leakage reduction in integrated circuits. In microprocessors, power gating is implemented by using sleep transistors to selectively deactivate circuit modules that remain idle for sustained periods of time during program execution. In this work, we develop a new framework for power gating the functional units in embedded system microprocessors without degradation in performance. The proposed framework includes an efficient algorithm for idle time estimation, appropriate insertion of sleep instructions within the code, and a method for reactivating the sleeping units only when needed without the use of wakeup instructions. We introduce the notion of loop hierarchy trees (LHTs) to represent the partial ordering of the nested loops within the program. From the control flow graph (CFG) representation of the source program, a forest of LHTs is constructed and is used to identify the maximal subgraphs representing the long idle periods for the functional units. For each subgraph thus identified, a sleep instruction is introduced in the program with a list of corresponding functional units to be deactivated. When an instruction is decoded, the functional units needed for that instruction are automatically activated by the control unit such that the units are ready before the instruction reaches the execute stage. This eliminates the need for wakeup instructions to be inserted into the object code reducing the overheads. In our implementation, the ARM processor architecture was modified and resynthesized to include power gating by developing a CMOS cell library of functional units with the above capabilities. Experimental results are reported for a set of 12 benchmarks chosen from the MiBench suite, which indicate that, on average, our technique reduces the leakage energy in functional units by 31.1% for integer benchmarks and 26.8% for floating-point benchmarks.

Sign up for access to the world's latest research.

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Reducing Execution Unit Leakage Power in Embedded Processors

2008

Abstract. We introduce low-overhead power optimization techniques to reduce leakage power in embedded processors. Our techniques improve previous work by a) taking into account idle time distribution for different execution units, and b) using instruction decode and control dependencies to wakeup the gated (but needed) units as soon as possible. We take into account idle time distribution per execution unit to detect an idle time period as soon as possible. This in turn results in increasing our leakage power savings. In addition, we use information already available in the processor to predict when a gated execution unit will be needed again. This results in early and less costly reactivation of gated execution units. We evaluate our techniques for a representative subset of MiBench benchmarks and for a processor using a configuration similar to Intels Xscale processor. We show that our techniques reduce leakage power considerably while maintaining performance. 1

Runtime Leakage Power Reduction Using Loop Unrolling and Fine Grained Power Gating

Journal of Low Power Electronics

The present work introduces a compilation technique to reduce runtime leakage power of functional units of a processor by combining loop unrolling with power gating. The instructions in the unrolled loop are scheduled to provide opportunities for power gating the functional units which are not used for a considerable amount of time. An algorithm that saves maximum leakage energy without performance loss due to execution of power gating instructions has been introduced. The algorithm does loop unrolling, scheduling of instructions and finally insert power gating instructions. The present work is explained using two illustrative examples, one without loop-carried dependence and the other with loop-carried dependence. It is observed that the number of clock cycles taken by the power gating instructions is less than or equal to the number of clock cycles saved by loop unrolling. This results in 23-64% reduction of the total energy consumed by the benchmark programs without any degradation of performance.

Loop unrolling with fine grained power gating for runtime leakage power reduction

18th International Symposium on VLSI Design and Test, 2014

The present work introduces a compilation technique to reduce runtime leakage power of functional units of a processor by combining loop unrolling with power gating. The instructions in the unrolled loop are scheduled to provide opportunities for power gating the functional units which are not in need for a considerable amount of time. The number of clock cycles taken by the power gating instructions is less than or equal to the number of clock cycles saved by loop unrolling. This results in 23-64% reduction of the total energy consumed by the benchmark programs without any degradation of performance. Index Terms-Clustering of instructions, fine grained power gating, grouping of instructions, inter-iteration data dependence, leakage power, loop unrolling, power gating instructions.

Microarchitectural techniques for power gating of execution units

Proceedings of the 2004 international symposium on Low power electronics and design - ISLPED '04, 2004

Leakage power is a major concern in current and future microprocessor designs. In this paper, we explore the potential of architectural techniques to reduce leakage through power-gating of execution units. This paper first develops parameterized analytical equations that estimate the break-even point for application of power-gating techniques. The potential for power gating execution units is then evaluated, for the range of relevant break-even points determined by the analytical equations, using a state-of-the-art out-of-order superscalar processor model. The power gating potential of the floating-point and fixed-point units of this processor is then evaluated using three different techniques to detect opportunities for entering sleep mode; ideal, time-based, and branch-misprediction-guided. Our results show that using the time-based approach, floating-point units can be put to sleep for up to 28% of the execution cycles at a performance loss of 2%. For the more difficult to power-gate fixed-point units, the branch misprediction guided technique allows the fixed-point units to be put to sleep for up to 40% more of the execution cycles compared to the simpler time-based technique, with similar performance impact. Overall, our experiments demonstrate that architectural techniques can be used effectively in power-gating execution units.

On leakage power optimization in clock tree networks for ASICs and general-purpose processors

Sustainable Computing: Informatics and Systems, 2011

Leakage power has grown significantly and is a major challenge in SoC design. Among SoC's components, clock distribution network power accounts for a large portion of chip power. This paper proposes to deploy sleep transistor insertion (STI) in the clock tree of datapaths in ASICs or in general-purpose processors in order to reduce leakage power. It characterizes the effect of sleep transistor sharing and sizing on clock tree wakeup time, leakage power, and propagation delay. It then uses these characteristics during leakage power optimization. It describes a post synthesis sleep transistor insertion (PSSTI), a heuristic clustering algorithm for sleep transistor insertion with the objective of total power minimization in a given clock tree. Sleep transistor sharing and sizing are deployed in order to meet the clock skew and wakeup delay constraints. The potential benefits of STI in ASIC design are evaluated using a standard industrial VLSI-CAD flow including sleep-transistor insertion and routing after the clock synthesis and place-and-route of the benchmark circuits. The results show that the clock tree leakage power is reduced by 19-32% depending on the topology of the synthesized clock tree. We also apply PSSTI in the clock tree of the datapaths in general-purpose processors using architectural control of the sleep mode. This achieves a reduction in the leakage power and the dynamic power of the clock tree within different datapath components (adder, multiplier, etc.) of as much as 80%. This approach is also applicable to other on-chip structure where inverters are large in size and commonly used, e.g. SRAMs or networks-on-a-chip, which combined with the clock tree savings will significantly reduce processor or ASIC overall power consumption.

IJERT-Comparison of Various Leakage Power Reduction Techniques for CMOS Circuit Design

International Journal of Engineering Research and Technology (IJERT), 2013

https://www.ijert.org/comparison-of-various-leakage-power-reduction-techniques-for-cmos-circuit-design https://www.ijert.org/research/comparison-of-various-leakage-power-reduction-techniques-for-cmos-circuit-design-IJERTV2IS100167.pdf Power consumption is now a major technical problem facing the CMOS circuits in deep submicron process. As process moves to finer technologies, leakage power significantly increases very rapidly due to the high transistor density, reduced voltage and oxide thickness. We first experimentally investigate existing low-power techniques and point out problems with them. We then propose a family of circuit types for low-power design centered around inserting controlling transistors between pull-up and pull down circuits as well as between pull-up circuits/pull down circuits and power/ground.We have compared different approach, named “sleepy keeper,” which reduces leakage current while saving exact logic state. Sleepy keeper uses traditional sleep transistors plus two additional transistors – driven by a gate’s already calculated output – to save state during sleep mode. In short, like the sleepy stack approach, sleepy keeper achieves leakage power reduction equivalent to the sleep and other approaches but with the advantage of maintaining exact logic state (instead of destroying the logic state when sleep mode is entered).. Unfortunately, sleepy keeper causes additional dynamic power consumption, approximately 15% more than the base case (no sleep transistors used at all). However, for applications spending the vast majority of time in sleep or standby mode while also requiring low area, high performance and maintenance of exact logic state, the sleepy keeper approach provides a new weapon in a VLSI designer's arsenal

Reducing Functional Unit Power Consumption and its Variation Using Leakage Sensors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000

Energy reduction of functional units (FUs) is a very important concern for high-end superscalar processors, not only because FUs consume a significant portion of processor energy, but also because they are one of the most important hotspots in the processor. In addition, the high sensitivity of leakage on temperature and process variation result in very high variation in the FU power consumption in different processor dies. Such high process variation reduces the parametric yield of processors. Consequently, reducing the FU power consumption and its variation is an important problem. However, existing FU power reduction techniques assumes all the FUs are similar, and do not consider the sensitivity of leakage on temperature. Consequently, they are not very effective in reducing the variation of FU power consumption. The advent of extremely small, yet accurate leakage sensors allow us to develop leakage-aware microarchitectural techniques to reduce both the power consumption and its variation among processor dies. Our leakage-aware operation-to-FU binding mechanism (LA-OFBM) and leakage-aware power gating (LA-PG) mechanisms reduce the mean and standard deviation of the total arithmetic logic unit (ALU) power consumption of the ALPHA 21364 by 34% and 59%, respectively. At the processor level, this translates to a 13% reduction in the total processor energy consumption, with a 24 C reduction in the maximum ALU temperature.

Leakage-Aware Modulo Scheduling for Embedded VLIW Processors

Journal of Computer Science and Technology, 2011

As semiconductor technologies move down to the nanometer scale, leakage power has become a significant component of the total power consumption. In this paper, we present a leakage-aware modulo scheduling algorithm to achieve leakage energy saving for applications with loops on Very Long Instruction Word (VLIW) architectures. The proposed algorithm is designed to maximize the idleness of function units integrated with the dual-threshold domino logic, and reduce the number of transitions between the active and sleep modes. We have implemented our technique in the Trimaran compiler and conducted experiments using a set of embedded benchmarks from DSPstone and Mibench on the cycle-accurate VLIW simulator of Trimaran. The results show that our technique achieves significant leakage energy saving compared with a previously published DAG-based (Directed Acyclic Graph) leakage-aware scheduling algorithm.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.