Code Transformations for TLB Power Reduction (original) (raw)

Power efficient instruction caches for embedded systems

2005

Instruction caches typically consume 27% of the total power in modern high-end embedded systems. We propose a compiler-managed instruction store architecture (K-store) that places the computation intensive loops in a scratchpad like SRAM memory and allocates the remaining instructions to a regular instruction cache. At runtime, execution is switched dynamically between the instructions in the traditional instruction cache and the ones in the K-store, by inserting jump instructions. The necessary jump instructions add 0.038% on an average to the total dynamic instruction count. We compare the performance and energy consumption of our K-store with that of a conventional instruction cache of equal size. When used in lieu of a 8KB, 4-way associative instruction cache, K-store provides 32% reduction in energy and 7% reduction in execution time. Unlike loop caches, K-store maps the frequent code in a reserved address space and hence, it can switch between the kernel memory and the instruction cache without any noticeable performance penalty.

Reducing TLB power requirements

Proceedings of the 1997 international symposium on Low power electronics and design - ISLPED '97, 1997

Translation look-aside buffers (TLBs) are small caches to speed-up address translation in processors with virtual memory. This paper considers two issues: (1) a comparison of the power consumption of fully-associative, set-associative, and direct mapped TLBs for the same miss rate and (2) the proposal of modifications of the basic cells and of the structure of set-associative TLBs to reduce the power. The power evaluation is done using a model and the miss rates are obtained from simulations of the SPEC92 benchmark. With respect to (1) we conclude that for small TLBs (high miss rates) fully-associative TLBs consume less power but for larger TLBs (low miss rates) set-associative TLBs are better. Moreover, the proposed modifications produce significant reductions in power consumption. Our evaluations show a reduction of 40 to 60% compared to the best traditional TLB. The proposed TLB implementation produces an increase in delay and in area. However, these increases are tolerable because the cycle time is determined by the slower cache and because the TLB area corresponds to only a small portion of the chip area.

[2010] Energy Efficiency Using Loop Buffer based Instruction Memory Organizations

2010 International Workshop on Innovative Architecture for Future Generation High Performance, 2010

Energy consumption in embedded systems is strongly dominated by instruction memory organizations. Based on this, any architectural enhancement introduced in this component will produce a significant reduction of the total energy budget of the system. Loop buffering is an effective scheme to reduce the energy consumption of the instruction memory organization. In this paper, a novel classification of architectural enhancements based on the use of loop buffer concept is presented. Using this classification, an energy design space exploration is performed to show the impact in the energy consumption on different application scenarios. From gate-level simulations, the energy analysis demonstrates that the instruction level paralellism of the system brings not only improvements in performance, but also improvements in the energy consumption of the system. The increase in instruction level paralellism makes easy the adaptation of the sizes of the loop buffers to the sizes of the loops that form the application, because gives more freedom to combine the execution of the loops that form the application.

Reducing Energy in Instruction Caches by Using Multiple Line Buffers with Prediction

Energy efficiency plays a crucial role in the design of embedded processors especially for portable devices with its limited energy source in the form of batteries. Since memory access (either cache or main memory) consumes a significant portion of the energy of a processor, the design of fast low-energy caches has become a very important aspect of modern processor design. In this paper, we present a novel cache architecture for reduced energy instruction caches. Our proposed cache architecture consists of the L1 cache, multiple line buffers, and a prediction mechanism to predict which line buffer, or L1 cache to access next. We used simulation to evaluate our proposed architecture and compare it with the HotSpot cache, Filter cache, Predictive line buffer cache and Way-Halting cache. Simulation results show that our approach can reduce instruction cache energy consumption, on average, by 75% (compared to the base line architcture) without sacrificing performance

AN ENERGY EFFICIENT DATACACHE EMBEDDED PROCESSOR

This paper presents a new cache design technique, referred to as early tag access (ETA) cache, to improve the energy efficiency of data caches in embedded processors, to determine the destination ways of memory instructions before the actual cache accesses. It, thus, enables only the destination way to be accessed if a hit occurs during the ETA. The new ETA cache can be configured under two operation modes to exploit the tradeoffs between energy efficiency and performance. It is shown that our technology is very effective in reducing the number of ways accessed during cache accesses. The ETA cache achieves over 52.8% energy reduction on average in the L1 data cache and translation look aside buffer. It is more effective in energy reduction while maintaining better performance and this technique is used to other levels of cache hierarchy and deals with multi threaded workloads.

Architectural and compiler techniques for energy reduction in high-performance microprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000

In this paper, we focus on low-power design techniques for high-performance processors at the architectural and compiler levels. We focus mainly on developing methods for reducing the energy dissipated in the on-chip caches. Energy dissipated in caches represents a substantial portion in the energy budget of today's processors. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip.

Using dynamic cache management techniques to reduce energy in general purpose processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000

The memory hierarchy of high-performance and embedded processors has been shown to be one of the major energy consumers. For example, the Level-1 (L1) instruction cache (I-Cache) of the StrongARM processor accounts for 27% of the power dissipation of the whole chip, whereas the instruction fetch unit (IFU) and the I-Cache of Intel's Pentium Pro processor are the single most important power consuming modules with 14% of the total power dissipation [2]. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip. In this paper, we propose a technique that uses an additional mini cache, the LO-Cache, located between the I-Cache and the CPU core. This mechanism can provide the instruction stream to the data path and, when managed properly, it can effectively eliminate the need for high utilization of the more expensive I-Cache. We propose, implement, and evaluate five techniques for dynamic analysis of the program instruction access behavior, which is then used to proactively guide the access of the LO-Cache. The basic idea is that only the most frequently executed portions of the code should be stored in the LO-Cache since this is where the program spends most of its time. We present experimental results to evaluate the effectiveness of our scheme in terms of performance and energy dissipation for a series of SPEC95 benchmarks. We also discuss the performance and energy tradeoffs that are involved in these dynamic schemes. Results for these benchmarks indicate that more than 60% of the dissipated energy in the I-Cache subsystem can be saved

A Survey of Architectural Techniques For Improving Cache Power Efficiency

Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers have proposed several techniques for improving energy efficiency of cache architectures. This paper surveys recent architectural techniques for improving cache power efficiency and also presents a classification of these techniques based on their characteristics. For providing an application perspective, this paper also reviews several real-world processor chips that employ cache energy saving techniques. The aim of this survey is to enable engineers and researchers to get insights into the techniques for improving cache power efficiency and motivate them to invent novel solutions for enabling lowpower operation of caches.