Adaptive energy management features of the POWER7TM processor (original) (raw)
Related papers
Introducing the Adaptive Energy Management Features of the Power7 Chip
IEEE Micro, 2000
Managing the power and performance trade-off of a running computer system is complex. Power7 has many low-level knobs for power management, but these also affect performance, depending on the type and combination of workloads being processed at a given time. Because there's no "one size fits all" policy, IBM's EnergyScale approach 1 employs an adaptive solution that encompasses hardware, firmware, and systems software. 2 A dedicated off-chip microcontroller, coupled with policy guidance from the customer and feedback from the Power Hypervisor and operating systems, determines operation modes and the best power and performance trade-off to implement during runtime to meet customer goals. Power7, like its predecessor Power6, provides the more traditional dynamic energy savings techniques such as clock-gating circuits when they're not needed, runtime scaling of frequency and voltage to adjust to varying use, and sensors to measure the environment and workloads under which the chip is operating. 3 This article describes several adaptive energy management features added to Power7 to augment these capabilities, presents empirically measured results of using these features, and discusses autonomic frequency-control capabilities that will provide further improved energy efficiency in the future.
Adaptive energy-management features of the IBM POWER7 chip
IBM Journal of Research and Development, 2011
The IBM POWER7 A processor implements several new adaptive power-management techniques that, in concert with the EnergyScalei firmware, allow it to proactively take advantage of variations in workload, environmental conditions, and overall system utilization to meet customer-directed power and performance goals. These features build on the support and the capabilities provided by its predecessor, i.e., the IBM POWER6i processor. Among these are per-core frequency scaling with available autonomous frequency controls, per-chip automated voltage slewing, power-consumption estimation, soft power capping, and hardware instrumentation assist. core voltages and serves as an on-chip proxy for the EnergyScale controller during certain power-management
Architecting for power management: The IBM® POWER7™ approach
HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture, 2010
The POWER7 processor is the newest member of the IBM POWER® family of server processors. With greater than 4X the peak performance and the same power budget as the previous generation POWER6®, POWER7 will deliver impressive energy-efficiency boosts. The improved peak energy-efficiency is accompanied by a wide array of new features in the processor and system designs that advance IBM's EnergyScale™ dynamic power management methodology. This paper provides an overview of these new features, which include better sensing, more advanced power controls, improved scalability for power management, and features to address the diverse needs of the full range of POWER servers from blades to supercomputers. We also highlight three challenges that need attention from a range of systems design and research teams: (i) power management in highly virtualized environments, (ii) power (in)efficiency of systems software and applications, and (iii) memory power costs, especially for servers with large memory footprints.
EnergyScale for IBM POWER6 microprocessor-based systems
IBM Journal of Research and Development, 2000
With increasing processor speed and density, denser system packaging, and other technology advances, system power and heat have become important design considerations. The introduction of new technology including denser circuits, improved lithography, and higher clock speeds means that power consumption and heat generation, which are already significant problems with older systems, are significantly greater with IBM POWER6e processor-based designs, including both standalone servers and those implemented as blades for the IBM BladeCentert product line. In response, IBM has developed the EnergyScalee architecture, a system-level power management implementation for POWER6 processor-based machines. The EnergyScale architecture uses the basic power control facilities of the POWER6 chip, together with additional board-level hardware, firmware, and systems software, to provide a complete power and thermal management solution. The EnergyScale architecture is performance aware, taking into account the characteristics of the executing workload to ensure that it meets the goals specified by the user while reducing power consumption. This paper introduces the EnergyScale architecture and describes its implementation in two representative platform designs: an eight-way, rack-mounted machine and a server blade. The primary focus of this paper is on the algorithms and the firmware structure used in the EnergyScale architecture, although it also provides the system design considerations needed to support performance-aware power management. In addition, it describes the extensions and modifications to power management that are necessary to span the range of POWER6 processor-based system designs.
Characterizing Processors for Energy and Performance Management
2015 16th International Workshop on Microprocessor and SOC Test and Verification (MTV), 2015
A processor executes a computing job in a certain number of clock cycles. The clock frequency determines the time that the job will take. Another parameter, cycle efficiency or cycles per joule, determines how much energy the job will consume. The execution time measures performance and, in combination with energy dissipation, influences power, thermal behavior, power supply noise and battery life. We describe a method for power management of a processor. An Intel processor in 32nm bulk CMOS technology is used as an illustrative example. First, we characterize the technology by H-spice simulation of a ripple carry adder for critical path delay, dynamic energy and static power at a wide range of supply voltages. The adder data is then scaled based on the clock frequency, supply voltage, thermal design power (TDP) and other specifications of the processor. To optimize the time and energy performance, voltage and clock frequency are determined showing 28% reduction both in execution time and energy dissipation.
Reducing energy usage with memory and computation-aware dynamic frequency scaling
2011
Over the life of a modern computer, the energy cost of running the system can exceed the cost of the original hardware purchase. This has driven the community to attempt to understand and minimize energy costs wherever possible. Towards these ends, we present an automated, fine-grained approach to selecting per-loop processor clock frequencies. The clock frequency selection criteria is established through a combination of lightweight static analysis and runtime tracing that automatically acquires application signatures -characterizations of the patterns of execution of each loop in an application. This application characterization is matched with a series of benchmark loops, which have been run on the target system and exercise it various ways. These benchmarks are intended to form a covering set, a machine characterization of the expected power consumption and performance traits of the machine over the space of execution patterns and clock frequencies. The frequency that confers the best power-delay product to the benchmark that most closely resembles each application loop is the one chosen for that loop. The application's frequency management strategy is then permanently integrated into the compiled executable via static binary instrumentation. This process is lightweight, only has to be done once per application (and the benchmarks just once per machine), and thus is much less laborious than running every application loop at every possible frequency on the machine to see what the optimal frequencies would be. Unlike most frequency management schemes, we toggle frequencies very frequently, potentially at every loop entry and exit, saving as much as 10% of the energy bill in the process. The set of tools that implement this scheme is fully automated, built on top of freely available open source software, and uses an inexpensive power measurement apparatus. We use these tools to show a measured, system-wide energy savings of up to 7.6% on an 8-core Intel Xeon E5530 and 10.6% on a 32-core AMD Opteron 8380 (a Sun X4600 Node) across a range of workloads.
Accurate Fine-Grained Processor Power Proxies
2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
There are not yet practical and accurate ways to directly measure core power in a microprocessor. This limits the granularity of measurement and control for computer power management. We overcome this limitation by presenting an accurate runtime per-core power proxy which closely estimates true core power. This enables new fine-grained microprocessor power management techniques at the core level. For example, cloud environments could manage and bill virtual machines for energy consumption associated with the core. The power model underlying our power proxy also enables energy-efficiency controllers to perform what-if analysis, instead of merely reacting to current conditions. We develop and validate a methodology for accurate power proxy training at both chip and core levels. Our implementation of power proxies uses on-chip logic in a high-performance multi-core processor and associated platform firmware. The power proxies account for full voltage and frequency ranges, as well as chip-to-chip process variations. For fixed clock frequency operation, a mean unsigned error of 1.8% for fine-grained 32ms samples across all workloads was achieved. For an interval of an entire workload, we achieve an average error of-0.2%. Similar results were achieved for voltagescaling scenarios, too. We also present two sample applications of the power proxy: (1) per-core power billing for cloud computing services; and (2) simultaneous runtime energy saving comparisons among different power management policies without running each policy separately.
Power and energy-aware processor scheduling
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering, 2011
Power consumption is a critical consideration in high computing systems. We propose a novel job scheduler that optimizes power and energy consumed by clusters when running parallel benchmarks with minimal impact on performance. We construct accurate models for estimating power consumption. These models are based on measurements of power consumption on benchmarks with different characteristics and on systems with processors using different micro-architectures. We show the power estimation models achieve less than 2% error versus actual measurements. We show a job scheduler can be enhanced to make it "power-aware" and to optimize power consumption of jobs with similar performance characteristics. The enhanced scheduler can estimate the power consumed by a particular job using the power estimation model, configure the nodes in the cluster via suitably adjusting processor frequency on each of the nodes to maximize performance, minimize power, or minimize energy with a predictable impact on power, energy and performance.
Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors
IEEE Micro, 2000
See the "Power-performance fundamentals" box. The most common (and perhaps obvious) metric to characterize the power-performance efficiency of a microprocessor is a simple ratio, such as MIPS (million instructions per second)/watts (W). This attempts to quantify efficiency by projecting the performance achieved or gained (measured in MIPS) for every watt of power consumed. Clearly, the higher the number, the "better" the machine is. While this approach seems a reasonable