MeSAP: A fast analytic power model for DRAM memories (original) (raw)

Precise DRAM Energy Modeling

In modern computer systems, main memory is consuming an increasing proportion of the power budget as DRAM scales. This trend has encouraged efforts to improve DRAM efficiency. Researchers still have only a limited understanding of the power consumption behavior of DRAM at a fine gran-ularity, and memory vendors provide worst-case figures for all power consumption values. DRAM also has unknown variation in power depending on process variation, temperature, read/write location, values in rows, and other factors. This paper proposes a new approach to characterize DRAM power consumption. Our precise DRAM energy modeling enables characterizations that are more faithful to real DRAM operation at a fine granularity. This mechanism is based on two key ideas. First, it employs real hardware in the form of an FPGA and small-outline dual inline memory modules (SO-DIMMs) to obtain physical measurements rather than using worst-case estimates. Second, it uses a software test framework that can isolate DRAM power consumption of individual commands as well as power consumption at the DIMM, bank, and cell level. We also introduce two methods of power measurement compatible with this FPGA-based system: data extraction from FPGA power monitors and direct measurement using a riser board. We implement and evaluate the DRAM power characterization mechanism described above. Our evaluations of data correctness find that tests without refresh are more suitable for DRAM power measurement on an FPGA. We evaluate our measurement systems and determine that neither mechanism is suitable in its current form. Thus, we propose a redesign for the riser board, as well.

FALPEM: Framework for Architectural-Level Power Estimation and Optimization for Large Memory Sub-Systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015

Framework is developed for estimation of power at pre register transfer level (RTL) stage for structured memory subsystems. Power estimation model is proposed specifically targeting power consumed by clock network and interconnect. The model is validated with VCD-based simulation on back-annotated netlist of an 8 MB memory subsystem used as video RAM (VRAM) for high-end graphics applications. This methodology also forms the basis for low-power exploration driving floor plan choice, gating structure of data, and clock network. We demonstrate 57% reduction in dynamic power by using low-power techniques for the 8 MB VRAM used as frame buffer in a graphics processor. FALPEM can be extended to other applications like processor cache and ASIC designs.

Power Modeling of SDRAMs

2004

Abstract—We present a model for estimating the power consumption of SDRAM at an architectural level. The approach is based on identifying the various operating states for a typical SDRAM, and using the knowledge of current drawn by the memory chip, and fraction of ...

Investigation of the Power–Performance Trade-off in High-Performance Processors

Demand for devices that are power-conscious is obvious and growing, and the need for scaling back power dissipation for heat concerns is pressing. However, power does not linearly correspond to performance, and a balance can be achieved. Several design-space changes are considered and evaluated using sim-wattch. In cache design, an effective level 1 cache is an absolute necessity. Leakage power in level 2 cache (and lower levels) can be drastically reduced by transitioning unused blocks to a lowpower state that preserves cache elements; dividing a level 2 cache into superblocks and introducing a buffer of superblocks to keep active can drastically cut leakage power at minimal performance cost. For a baseline configuration, issue width, decode width and RUU size are varied and are found to correspond directly to power consumption. Several branch prediction strategies are tested, showing bimodal to be the most useful, and 2-layer to be the most interesting.

Evaluating Power Management Strategies for High Performance Memory (DRAM): A Case Study for Achieving Effective Analysis by Combining Simulation Platforms and Real Hardware Performance Monitoring

Developing dynamic power management strategies for high performance systems requires a good understanding of the power-performance trade-offs underlining the workload-strategy-system interactions. The feasibility of detailed simulations to help here is severely limited by the range of problem sizes that can be simulated and simulation speed. For our work we want to understand the effectiveness of DRAM power management for a 'realistic' workload-RF-CTH-about 500K lines of Fortran-C benchmark developed at Sandia National Labs. The target architecture is a CMP-based building block for a future high performance computing architecture. We address the scalability and speed limitations for detailed simulation by adopting a multistep characterization and simulation approach incorporating analysis on current generation hardware using processor performance counters, a fast memory hierarchy simulation engine, a detailed system-level simulator and a power-performance simulator for the DRAM subsystem.

An integrated performance and power model for superscalar processor designs

2005

On current superscalar processors, performance and power issues cannot be decoupled for designers. Extensive simulations are usually required to meet both power and performance constraints. This paper describes an integrated performance and power analytical model. The model's performance and power results are in good agreement with detailed simulations, previous models and physically measured results. For designers, the model enables quick and flexible explorations into a subset of even entire huge parameter space of more than 15 workload and architectural parameters plus leakage power, feature sizes, clock and voltage.

Fast Instruction Memory Hierarchy Power Exploration for Embedded Systems

IFIP Advances in Information and Communication Technology, 2010

A typical instruction memory design exploration process using simulation tools for various cache parameters is a rather time-consuming process, even for low complexity applications. In order to design a power efficient memory hierarchy of an embedded system, a huge number of system simulations are needed for all the different instruction memory hierarchies, because many cache memory parameters should be explored. Exhaustive search of design space using simulation is too slow procedure and needs hundreds of simulations to find the optimal cache configuration. This chapter provides fast and accurate estimates of a multi-level instruction memory hierarchy. Using a detail methodology for estimating the number of instruction cache misses of the instruction cache levels and power models; we estimate within a reasonable time the power consumption among these hierarchies. In order to automate the estimation procedure, a novel software tool named FICA implements the proposed methodology, which automatically estimates the total energy in instruction memory hierarchy and reports the optimal one.

Energy consumption modeling and optimization for SRAM's

IEEE Journal of Solid-State Circuits, 1995

Absauct-The recent trends in portable computing technologies have established the need for energy efficient design strategies. To achieve minimum energy design goals, system designers need a technique to accurately model the energy consumption of their design alternatives without performing a full physical design and full-circuit simulation. This paper presents and compares five approaches for modeling the energy consumption of CMOS circuits. These five modeling approaches have been chosen to represent the various levels of model complexity and accuracy found in the current literature. These modeling approaches are applied to the energy consumption of SRAM's to provide examples of their use and to allow for the comparison of their modeling qualities. It was found that a mixed characterization model-ing a CVz prediction for digital subsections and fitted simulation results for the analog subsections-is satisfactory (within f l process variation) for predicting the absolute energy consumed per cycle. This same model is also very good (within 2%) for predicting an optimum organization for the internal structures of the SRAM. Several common architectures and circuit designs for SRAM's are analyzed with these models. This analysis shows that global, rather than local improvements, produce the largest energy savings.

Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs

2018 28th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2018

While state-of-the-art system-level simulators can deliver swift estimation of power dissipation for microprocessor designs, they do so at the expense of reduced accuracy. On the other hand, RTL simulators are typically cycle-accurate but overwhelmingly time consuming for real-life workloads. Consequently, the design community often has to make a compromise between accuracy and speed. In this work, we propose a novel cross-layer approach that can enable accurate power estimation by carefully integrating components from system-level and RTL simulation of the target design. We first leverage the concept of simulation points to transform the workload application and isolate its most critical segments. We then profile the highest weighted simulation point (HWSP) with a RTL simulator (AnyCore) for maximum accuracy, while the rest are simulated with a system-level simulator (gem5) for ensuring fast evaluation. Finally, we combine the integrated set of profiling data as input to the power simulator (McPAT). Our evaluation results for three different SPEC2006 benchmark applications demonstrate that our proposed crosslayer framework can improve the power estimation accuracy by up to 15% for individual simulation points and by ∼9% for the full application, compared to that of a conventional system-level simulation scheme.