Performance-Energy Trade-off in Modern CMPs (original) (raw)

Resource Sharing Centric Dynamic Voltage and Frequency Scaling for CMP Cores, Uncore, and Memory

ACM Transactions on Design Automation of Electronic Systems, 2016

With the breakdown of Dennard’s scaling over the past decade, performance growth of modern microprocessor design has largely relied on scaling core count in chip multiprocessors (CMPs). The challenge of chip power density, however, remains and demands new power management solutions. This work investigates a coordinated CMP systemwide Dynamic Voltage and Frequency Scaling (DVFS) policy centered around shared resource utilization. This approach represents a new angle on the problem, differing from the conventional core-workload-driven approaches. The key component of our work is per-core DVFS leveraging a technique similar to TCP Vegas congestion control from networking. This TCP Vegas–based DVFS can potentially identify the synergy between power reduction and performance improvement. Further, this work includes uncore (on-chip interconnect and shared last level cache) and main memory DVFS policies coordinated with the per-core DVFS policy. Full system simulations on PARSEC benchmarks...

Low-Complexity Policies for Energy-Performance Tradeoff in Chip-Multi-Processors

2011

Chip-Multi-Processors (CMP) utilize multiple energy-efficient Processing Elements (PEs) to deliver high performance while maintaining an efficient ratio of performance to energy-consumption. In order to utilize CMP resources, the software application is split into multiple tasks that are executed in parallel on the PEs. Dynamic frequency-Voltage Scaling (DVS) balances performance and energy consumption by dynamically varying a PE's frequency-voltage workpoint in order to save energy while meeting performance requirements. This work addresses DVS policies for CMP. We consider multi-task CMP applications with unknown workloads. We dynamically set frequency-voltage workpoints for each PE in the CMP, attempting to minimize a defined energy-performance criterion.. Other DVS methods typically use high complexity optimization techniques, which limits the possibility of real-time implementation in performance-driven, energy-aware systems. In contrast, we investigate simple heuristic DV...

Reducing energy usage with memory and computation-aware dynamic frequency scaling

2011

Over the life of a modern computer, the energy cost of running the system can exceed the cost of the original hardware purchase. This has driven the community to attempt to understand and minimize energy costs wherever possible. Towards these ends, we present an automated, fine-grained approach to selecting per-loop processor clock frequencies. The clock frequency selection criteria is established through a combination of lightweight static analysis and runtime tracing that automatically acquires application signatures -characterizations of the patterns of execution of each loop in an application. This application characterization is matched with a series of benchmark loops, which have been run on the target system and exercise it various ways. These benchmarks are intended to form a covering set, a machine characterization of the expected power consumption and performance traits of the machine over the space of execution patterns and clock frequencies. The frequency that confers the best power-delay product to the benchmark that most closely resembles each application loop is the one chosen for that loop. The application's frequency management strategy is then permanently integrated into the compiled executable via static binary instrumentation. This process is lightweight, only has to be done once per application (and the benchmarks just once per machine), and thus is much less laborious than running every application loop at every possible frequency on the machine to see what the optimal frequencies would be. Unlike most frequency management schemes, we toggle frequencies very frequently, potentially at every loop entry and exit, saving as much as 10% of the energy bill in the process. The set of tools that implement this scheme is fully automated, built on top of freely available open source software, and uses an inexpensive power measurement apparatus. We use these tools to show a measured, system-wide energy savings of up to 7.6% on an 8-core Intel Xeon E5530 and 10.6% on a 32-core AMD Opteron 8380 (a Sun X4600 Node) across a range of workloads.

Dynamic voltage and frequency scaling for shared resources in multicore processor designs

Proceedings of the 50th Annual Design Automation Conference, 2013

As the core count in processor chips grows, so do the on-die, shared resources such as on-chip communication fabric and shared cache, which are of paramount importance for chip performance and power. This paper presents a method for dynamic voltage/frequency scaling of networks-on-chip and last level caches in multicore processor designs, where the shared resources form a single voltage/frequency domain. Several new techniques for monitoring and control are developed, and validated through full system simulations on the PARSEC benchmarks. These techniques reduce energydelay product by 56% compared to a state-of-the-art prior work.

An integrated approach to system-level CPU and memory energy efficiency on computing systems

2012 International Conference on Energy Aware Computing, 2012

Energy-efficient computing is becoming more important with the latest technology improvements. State-of-theart dynamic voltage/frequency scaling (DVFS) policies manage resources' voltage and frequency to achieve higher energy efficiency. A DVFS policy manages a single resource by continuously evaluating its utilization. We propose a new integrated DVFS policy that manages both CPU and memory. The policy selects their frequency/voltage based on resources' combined state, rather than evaluating isolated information about each resource. For the SPEC CPU2006 benchmark, results show that our policy has an average of 9.04% energy efficiency improvement for CPU and memory compared to 4.84% savings by an independent policy.

Scheduling for Better Energy Efficiency on Many-Core Chips

Many-core chips are especially attractive for data center operators providing cloud computing service models. With the advance of many-core chips in such environments energy-conscious scheduling of independent processes or operating systems (OSes) is gaining importance. An important research question is how the scheduler of such a system should assign the cores to the OSes in order to achieve a better energy utilization. In this paper, we demonstrate that many-core chips offer new opportunities for extremely lightweight migration of independent processes (or OSes) running bare-metal on the many-core chip. We then show how this intra-chip migration can be utilized to achieve a better performance per watt ratio by implementing a hierarchical power-management scheme on top of dynamic voltage and frequency scaling (DVFS). We have implemented and tested the proposed techniques on the Intel Single Chip Cloud Computer (SCC). Combining migration with DVFS we achieve, on average, a 25–35% better performance per watt over a DVFS-only solution.

SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors

2020

There are three domains in a modern thermally-constrained mobile system-on-chip (SoC): compute, IO, and memory. We observe that a modern SoC typically allocates a fixed power budget, corresponding to worst-case performance demands, to the IO and memory domains even if they are underutilized. The resulting unfair allocation of the power budget across domains can cause two major issues: 1) the IO and memory domains can operate at a higher frequency and voltage than necessary, increasing power consumption and 2) the unused power budget of the IO and memory domains cannot be used to increase the throughput of the compute domain, hampering performance. To avoid these issues, it is crucial to dynamically orchestrate the distribution of the SoC power budget across the three domains based on their actual performance demands.We propose SysScale, a new multi-domain power management technique to improve the energy efficiency of mobile SoCs. SysScale is based on three key ideas. First, SysScale...

Characterizing Processors for Energy and Performance Management

2015 16th International Workshop on Microprocessor and SOC Test and Verification (MTV), 2015

A processor executes a computing job in a certain number of clock cycles. The clock frequency determines the time that the job will take. Another parameter, cycle efficiency or cycles per joule, determines how much energy the job will consume. The execution time measures performance and, in combination with energy dissipation, influences power, thermal behavior, power supply noise and battery life. We describe a method for power management of a processor. An Intel processor in 32nm bulk CMOS technology is used as an illustrative example. First, we characterize the technology by H-spice simulation of a ripple carry adder for critical path delay, dynamic energy and static power at a wide range of supply voltages. The adder data is then scaled based on the clock frequency, supply voltage, thermal design power (TDP) and other specifications of the processor. To optimize the time and energy performance, voltage and clock frequency are determined showing 28% reduction both in execution time and energy dissipation.

Power aware microarchitecture resource scaling

2001

In this paper we present a strategy for run-time profiling to optimize the configuration of a microprocessor dynamically so as to save power with minimum performance penalty. The configuration of the processor changes according to the parallelism in the running program. Experiments on some benchmark programs show good savings in total energy consumption; we have observed a decrease of up to 23% in energy/cycle and up to 8% in energy per instruction. Our proposed approach can be used for energy-aware computing in either portable applications or in desktop environments where power density is becoming a concern. This approach can also be incorporated in larger power management strategies like ACPI.

Accurate Characterization of the Variability in Power Consumption in Modern Mobile Processors

2012

The variability in performance and power consumption is slated to grow further with continued scaling of process technologies. While this variability has been studied and modeled before, there is lack of empirical data on its extent, as well as the factors affecting it, especially for modern general purpose microprocessors. Using detailed power measurements we show that the part to part variability for modern processors utilizing the Nehalem microarchitecture is indeed significant. We chose six Core i5-540M laptop processors marketed in the same frequency bins - thus presumed to be identical - and characterized their power consumption for a variety of representative single-threaded and multithreaded application workloads. Our data shows processor power variation ranging from 7%-17% across different applications and configuration options such as Hyper-Threading and Turbo Boost. We present our hypotheses on the underlying causes of this observed power variation and discuss its potenti...