Fast Instruction Memory Hierarchy Power Exploration for Embedded Systems (original) (raw)

FALPEM: Framework for Architectural-Level Power Estimation and Optimization for Large Memory Sub-Systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015

Framework is developed for estimation of power at pre register transfer level (RTL) stage for structured memory subsystems. Power estimation model is proposed specifically targeting power consumed by clock network and interconnect. The model is validated with VCD-based simulation on back-annotated netlist of an 8 MB memory subsystem used as video RAM (VRAM) for high-end graphics applications. This methodology also forms the basis for low-power exploration driving floor plan choice, gating structure of data, and clock network. We demonstrate 57% reduction in dynamic power by using low-power techniques for the 8 MB VRAM used as frame buffer in a graphics processor. FALPEM can be extended to other applications like processor cache and ASIC designs.

Power efficient instruction caches for embedded systems

2005

Instruction caches typically consume 27% of the total power in modern high-end embedded systems. We propose a compiler-managed instruction store architecture (K-store) that places the computation intensive loops in a scratchpad like SRAM memory and allocates the remaining instructions to a regular instruction cache. At runtime, execution is switched dynamically between the instructions in the traditional instruction cache and the ones in the K-store, by inserting jump instructions. The necessary jump instructions add 0.038% on an average to the total dynamic instruction count. We compare the performance and energy consumption of our K-store with that of a conventional instruction cache of equal size. When used in lieu of a 8KB, 4-way associative instruction cache, K-store provides 32% reduction in energy and 7% reduction in execution time. Unlike loop caches, K-store maps the frequent code in a reserved address space and hence, it can switch between the kernel memory and the instruction cache without any noticeable performance penalty.

Energy optimization of multi-level processor cache architectures

Proceedings of the 1995 international symposium on Low power design - ISLPED '95, 1995

To optimize performance and power of a processor's cache, a multiple-divided module (MDM) cache architecture is proposed to save power at memory peripherals as well as the bit array. For a MxB-divided MDM cache, latency is equivalent to that of the smallest module and power consumption is only 1/MxB of the regular, non-divided cache. Based on the architecture and given transistor budgets for onchip processor caches, this paper extends investigation to analyze energy effects from cache parameters in a multi-level cache design. The analysis is based on execution of SPECint92 benchmark programs with miss ratios of a RISC processor.

Estimating Cache and TLB Power in Embedded Processor Using Complete Machine Simulation

2003

In this paper we propose to combine power estimation and optimization technique for Cache and TLB components of embedded system. The power estimation is done at the architectural level using complete machine simulation model, where as, the optimization is done at the circuit level by applying low power design technique to content addressable memory. It has been shown that for accuracy and flexibility such an approach complements the embedded system design.

An improved instruction-level energy model for RISC microprocessors

Proceedings of the 2013 9th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), 2013

The power or energy consumed by a chip has become a primary design constraint for embedded systems and is largely affected by software. However, there is a gap between software and hardware that makes it hard to predict which code consumes the least power before running it. Therefore, it is vital to discover which factors can affect a program's energy consumption. In this paper we present an instruction-level power model for single core, in-order RISC processor architecture. We do not analyze each instruction individually, but we study the average power and running time instead. We find the power in a processor is nearly constant, no matter what instructions are run, but the IO port power is related to the behavior of the program. Furthermore, we provide a model that takes the cache miss rate into consideration.

Instruction level power model of microcontrollers

1999

In the design of low power systems, it is important to analyze and optimize both the hardware and the software component of the system. To evaluate the software component of the system, a good instruction-level energy model is essential. In this paper we present a methodology for instruction level modelling of microcontrollers using gate level power estimation tools. We use the microcontroller, M68HC11, to illustrate this method. We study two different implementations of the microcontroller and show that the energy consumption of each instruction is quite different. Our study reveals that data correlation does not significantly affect the energy consumption of most instructions. Finally, we show the correctness of this model by running some sample programs and showing that the predicted energy estimates are quite close to the actual estimates.

Instruction level power analysis and optimization of software

Journal of VLSI signal processing systems for signal, image and video technology, 1996

The increasing popularity of power constrained mobile computers and embedded computing applications drives the need for analyzing and optimizing power in all the components of a system. Software constitutes a major component of today's systems, and its role is projected to grow even further. Thus, an ever increasing portion of the functionality of today's systems is in the form of instructions, as opposed to gates. This motivates the need for analyzing power consumption from the point of view of instructions-something that traditional circuit and gate level power analysis tools are inadequate for. This paper describes an alternative, measurement based instruction level power analysis approach that provides an accurate and practical way of quantifying the power cost of software. This technique has been applied to three commercial, architecturally di erent processors. The salient results of these analyses are summarized. Instruction level analysis of a processor helps in the development of models for power consumption of software executing on that processor. The power models for the subject processors are described and interesting observations resulting from the comparison of these models are highlighted. The ability to evaluate software in terms of power consumption makes it feasible to search for low power implementations of given programs. In addition, it can guide the development of general tools and techniques for low power software. Several ideas in this regard as motivated by the power analysis of the subject processors are also described.

Power aware design of second level cache for multicore embedded systems

Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon), 2010

Designing efficient cache, memory, and storage subsystem for modern embedded systems supporting a variety of applications is a great need. Embedded systems are being deployed with multicore processors to help parallel and distributed computing in order to meet the requirements for increased processing speed. Multiple cores offer manifold options to organize multi-level caches. A mixture of cache memory hierarchies are proposed to satisfy the requirements of high-performance low-power multicore embedded systems. In this paper, we investigate the impact of CL2 organizations on the performance and power consumption for multicore embedded systems. We simulate two 4-core architectures, one with shared CL2 and the other one with private CL2s. We use MPEG4, FFT, MI, and DFT applications/algorithms in our experiment. Simulation results depict that the mean delay and total power consumption significantly vary with the variations of CL2 organization and applications. It is observed that reductions in total power consumption and mean delay per task of up to 43% and 36%, respectively, are possible with optimized CL2, with an optimal choice of 256KB CL2 cache, 64 B CL2 line size, and 8-way CL2 associativity level. I.

Functional level power analysis: an efficient approach for modeling the power consumption of complex processors

Proceedings Design, Automation and Test in Europe Conference and Exhibition

A high-level consumption estimation methodology and its associated tool, SoftExplorer, are presented. The estimation methodology uses a functional modeling of the processor combined with a parametric model to allow the designer to estimate the power consumption when the embedded software is executed on the target. SoftExplorer uses as input the assembly code generated by the compiler; its efficiency is compared to SimplePower's approach. Results for different processors (TI C62, C67, C55 and ARM7) and for several DSP applications provide an average error less than 5%. The accuracy and the rapidness of the estimation allow using SoftExplorer for efficiently guiding the designer in choosing the more appropriate processor for his application.