Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems (original) (raw)

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

2003

This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an application's execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements.

Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures

IEEE Computer Architecture Letters, 2003

This paper proposes a single-ISA heterogeneous multi-core architecture as a mechanism to reduce processor power dissipation. It assumes a single chip containing a diverse set of cores that target different performance levels and consume different levels of power. During an application's execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements. It describes an example architecture with five cores of varying performance and complexity. Initial results demonstrate a five-fold reduction in energy at a cost of only 25% performance.

The Forgotten 'Uncore': On the Energy-Efficiency of Heterogeneous Cores

USENIX ATC, 2012

Heterogeneous multicore processors (HMPs), consisting of cores with different performance/power characteristics, have been proposed to deliver higher energy efficiency than symmetric multicores. This paper investigates the opportunities and limitations in using HMPs to gain energy-efficiency. Unlike previous work focused on server systems, we focus on the client workloads typically seen in modern end-user devices. Further, beyond considering core power usage, we also consider the 'uncore' subsystem shared by all cores, which in modern platforms, is an increasingly important contributor to total SoC power. Experimental evaluations use client applications and usage scenarios seen on mobile devices and a unique testbed comprised of heterogeneous cores, with results that highlight the need for uncore-awareness and uncore scalability to maximize intended efficiency gains from heterogeneous cores.

A study on performance benefits of core morphing in an asymmetric multicore processor

2010 IEEE International Conference on Computer Design, 2010

Multicore architectures are designed so as to provide an acceptable level of performance per unit power for the majority of applications. Consequently, we must occasionally expect applications that could have benefited from a more powerful core in terms of either lower execution time and/or lower energy consumed. Fusing some of the resources of two (or more) cores to configure a more powerful core for such instances is a natural approach to deal with those few applications that have very high performance demands. However, a recent study has shown that fusing homogeneous cores is unlikely to benefit applications. In this paper we study the potential performance benefits of core morphing in a heterogeneous multicore processor that can be reconfigured at runtime. We consider as an example a dual core processor with one of the two cores being designed to target integer intensive applications while the other is better suited to floating-point intensive applications. These two cores can be fused into a single powerful core when an application that can benefit from such fusion is executing. We first discuss the design principles of the two individual cores so that the majority of the benchmarks that we consider execute in a satisfactory way. We then show that a small subset of the considered applications can greatly benefit from core morphing even in the case where two applications that could have been executed in parallel on the two cores are run, for some percentage of time, on the single morphed core. Our results indicate that a performance gain of up to 100% is achievable at a small hardware overhead of less than 1%.

Performance and Power Benefits of Sharing Execution Units between a High Performance Core and a Low Power Core

2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems, 2014

Several studies and real world designs have advocated the sharing of large execution units between pairs of cores in Symmetric Multicore Processors (SMP) for area and power savings. Such sharing was shown to have negligible impact on performance. Recently, a number of Asymmetric Multicore Processor (AMP) designs have become available. The objective of this paper is to investigate whether sharing of resources across AMPs offers similar benefits. Our study shows that while the area and the power savings remain similar, the performance of the smaller core in the AMP can improve significantly making sharing even more attractive for AMPs. Simulation results indicate that for certain workloads, the performance of the small core may improve by as much as 54% by sharing certain large execution resources of the big core, while affecting the performance of the big core by only ∼4%, resulting in an overall gain in system performance of 20%. The corresponding improvement in aggregate performance/Watt is 12% while the area savings is about 7%.

Speculative Multithreading Does not (Necessarily) Waste Energy Draft paper submitted for publication. November 6, 2003. Please keep confidential

While Chip Multiprocessors (CMP) with Speculative Multithreading (SM) have been gaining momentum, experienced processor designers in industry have reservations about their practical implementation. In particular, it is felt that SM is too energy-inefficient to compete against conventional superscalars. This paper challenges the commonly-held view that SM consumes excessive energy. We show a CMP with SM support that is not only faster but also more energy efficient than a state-of-the-art wide-issue superscalar. We demonstrate it with a new energy-efficient CMP micro-architecture. In addition, we identify the additional sources of energy consumption in SM, and propose energy-centric optimizations that mitigate them. Experiments with the SpecInt 2000 codes show that a CMP with 2 4-issue cores and support for SM delivers a speedup of 1.08 over a 8-issue superscalar and consumes only 54% of its power. Alternatively, for the same average power in both chips, the SM CMP is 1.6 times faster than the superscalar on average. 1 Introduction Substantial research effort is currently being devoted to speeding up hard-to-parallelize nonnumerical applications such as SpecInt. Designers build sophisticated out-of-order processors, with carefully-tuned execution engines and memory subsystems. Unfortunately, these systems tend to combine high design complexity with diminishing performance returns, which has motivated the search for design alternatives. One such alternative is Speculative Multithreading (SM) on a Chip Multiprocessor (CMP) [2, 7, 8, 9, 10, 14, 15, 18, 19, 24]. Under SM, these hard-to-analyze applications are carefully partitioned into tasks, which are then optimistically executed in parallel, hoping that no data or control dependence will be violated. A hardware safety net monitors the tasks' control flow and data accesses, watching for violations at run time. When one happens, the hardware transparently rolls back the incorrect tasks and, after repairing the state, restarts them. SM on a CMP has been the subject of intense study for nearly a decade now. Recent results appear to show that a few processors on a CMP with SM support can speed up hard-to-parallelize non-numerical applications as much as or more than wider-issue superscalars (e.g. [2, 8, 9, 22]). This is significant because CMPs are attractive platforms: unlike wide-issue superscalars, they provide a decentralized architecture with low-complexity processors. Moreover, CMPs have a natural advantage for explicitly-parallel codes. Unfortunately, experienced processor designers in industry have reservations about the practical implementation of SM. In particular, it is felt that SM is too energy-inefficient to seriously challenge superscalars. The rationale is that aggressive speculative execution of possibly unnecessary or incorrect tasks is not the best course in a day and age when processors are primarily constrained by energy issues. Indeed, energy issues have become arguably the main concern for designers of high-end microprocessors. Energy and power consumption directly affect the cost of powering and cooling the system, influence the reliability and aging characteristics of chips, and determine battery life in portable devices. While the simpler cores in a CMP are energy-efficient, CMPs with SM will not be accepted unless their overall energy requirements are competitive against wide-issue superscalars. In this paper, we directly address the problem of energy consumption in SM. Our main contribution is to show that contrary to popular belief, SM does not necessarily waste energy. We show that SM is not only faster, but also more energy efficient than a state-of-the-art wide-issue superscalar. We demonstrate it with a new energy-efficient micro-architecture for a CMP with SM. This is the first paper to show that SM on a CMP is an interesting design point even for high-performance power-constrained designs. In addition, we identify and analyze the sources of energy consumption in SM. These issues are: the wasted work of squashed tasks, storage and logic in the memory hierarchy to support data versioning, additional traffic in the memory subsystem, and additional instructions. We also propose energy-centric optimizations that mitigate these SM sources of energy consumption. These optimizations have been overlooked in performance-centric SM designs because they enhance energy-savings and not performance. In our experiments with the SpecInt 2000 benchmarks, we show that a CMP with 2 4-issue cores delivers a speedup of 1.08 over an 8-issue superscalar while consuming only 54% of its power. Alternatively, for the same average power in both chips, the SM CMP is 1.6 times faster than the superscalar on average. This paper is organized as follows: Section 2 provides background on SM; Section 3 analyzes the sources of energy consumption in SM and proposes energy-centric optimizations; Section 4 describes the proposed SM CMP micro-architecture; Section 5 describes our SM compilation infrastructure; Sections 6 and 7 present our evaluation methodology and the evaluation; and Section 8 concludes.

Execution migration in a heterogeneous-ISA chip multiprocessor

2012

Abstract Prior research has shown that single-ISA heterogeneous chip multiprocessors have the potential for greater performance and energy efficiency than homogeneous CMPs. However, restricting the cores to a single ISA removes an important opportunity for greater heterogeneity. To take full advantage of a heterogeneous-ISA CMP, however, we must be able to migrate execution among heterogeneous cores in order to adapt to program phase changes and changing external conditions (eg, system power state).

Studying the impact of application-level optimizations on the power consumption of multi-core architectures

Proceedings of the 9th conference on Computing Frontiers, 2012

This paper studies the overall system power variations of two multi-core architectures, an 8-core Intel and a 32-core AMD workstation, while using these machines to execute a wide variety of sequential and multi-threaded benchmarks using varying compiler optimization settings and runtime configurations. Our extensive experimental study provides insights for answering two questions: 1) what degrees of impact can application level optimizations have on reducing the overall system power consumption of modern CMP architectures; and 2) what strategies can compilers and application developers adopt to achieve a balanced performance and power efficiency for applications from a variety of science and embedded systems domains.

Improving the energy efficiency of software systems for multi-core architectures

The ICT has an huge impact on the world CO2 emissions and recent study estimates its account to 2% of these emissions. This growing account emissions makes IT energy efficiency an important challenge. State-of-the-art has proven that the processor is the main power consumer. Processor are nowadays more and more complex and they are used in many hardware systems, such as computers or smartphones. This thesis is thus focusing on the software energy efficiency for multi-core systems. In this paper, we therefore report our motivations to understand deeply their architectures for improving their energy efficiencies. Manufacturers have worked tremendously to improve performance and reduce power consumption of their processors. However a lot of things remains to do in the software side. We claim that energy-efficient softwares can play a deterministic role to reduce the IT carbon footprint. To answer this challenge, we are believing on the software-metric approach with a minimal hardware investment. For this purpose, an efficient, scalable and non-invasive tool is needed. As a result, we created PowerAPI, to provide fine-grained power estimations at process and code-level for optimizing the software enegergy efficiency automatically. This solution will help to identify clearly the energy leaks for optimizing automatically the power consumed by software.