Heterogeneous chip multiprocessors (original) (raw)

Future of multiprocessors: Heterogeneous Chip Multiprocessors

2012

Abstract As computer applications are becoming complex, large and versatile; the advent of Chip multiprocessor is ubiquitous. There are numerous researches going on about the core architectures within the chip. Heterogeneous Chip Multiprocessor (CMP) is leading the innovation. Heterogeneous CMP is composed of cores of varying performance, and complexity. It gives better area to performance ratio, high throughput, and higher speed up and mitigates Amdahl's bottleneck to some extent.

Heterogeneous Chip Multiprocessors: A Survey

maqayum.yolasite.com

As computer applications are becoming complex, large and versatile; the advent of Chip multiprocessor was ubiquitous. There are numerous researches going on about the core architectures within the chip. Heterogeneous Chip Multiprocessor (CMP) is leading the innovation. It is composed of cores of varying performance, and complexity. It gives better area to performance ratio, high throughput, and higher speed up and mitigates Amdahl's bottleneck to some extent. This paper surveys recent researches of various configuration issues related to Heterogeneous CMP in details. There are four major issues OS part-scheduling applications to different cores, configuration of cores, Amdahl's law utilization. Finally some recommendations are drawn from the study.

Performance-Energy Trade-off in Modern CMPs

ACM Transactions on Architecture and Code Optimization, 2021

Chip multiprocessors (CMPs) are ubiquitous in all computing systems ranging from high-end servers to mobile devices. In these systems, energy consumption is a critical design constraint as it constitutes the most significant operating cost for computing clouds. Analogous to this, longer battery life continues to be an essential user concern in mobile devices. To optimize on power consumption, modern processors are designed with Dynamic Voltage and Frequency Scaling (DVFS) support at the individual core as well as the uncore level. This allows fine-grained control of performance and energy. For an n core processor with m core and uncore frequency choices, the total DVFS configuration space is now m (n+1) (with the uncore accounting for the + 1). In addition to that, in CMPs, the performance-energy trade-off due to core/uncore frequency scaling concerning a single application cannot be determined independently as cores share critical resources like the last level cache (LLC) and the m...

Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th …, 2006

Previous studies have demonstrated the advantages of single-ISA heterogeneous multi-core architectures for power and performance. However, none of those studies examined how to design such a processor; instead, they started with an assumed combination of pre-existing cores.

The Forgotten 'Uncore': On the Energy-Efficiency of Heterogeneous Cores

USENIX ATC, 2012

Heterogeneous multicore processors (HMPs), consisting of cores with different performance/power characteristics, have been proposed to deliver higher energy efficiency than symmetric multicores. This paper investigates the opportunities and limitations in using HMPs to gain energy-efficiency. Unlike previous work focused on server systems, we focus on the client workloads typically seen in modern end-user devices. Further, beyond considering core power usage, we also consider the 'uncore' subsystem shared by all cores, which in modern platforms, is an increasingly important contributor to total SoC power. Experimental evaluations use client applications and usage scenarios seen on mobile devices and a unique testbed comprised of heterogeneous cores, with results that highlight the need for uncore-awareness and uncore scalability to maximize intended efficiency gains from heterogeneous cores.

Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures

2019

Energy-efficiency has become a major challenge in modern computer systems. To address this challenge, candidate systems increasingly integrate heterogeneous cores in order to satisfy diverse computation requirements by selecting cores with suitable features. In particular, single-ISA heterogeneous multicore processors such as ARM big.LITTLE have become very attractive since they offer good opportunities in terms of performance and power consumption trade-off. While existing works already showed that this feature can improve system energy-efficiency, further gains are possible by generalizing the principle to higher levels of heterogeneity. The present paper aims to explore these gains by considering single-ISA heterogeneous multicore architectures including three different types of cores. For this purpose, we use the Samsung Exynos Octa 5422 chip as baseline architecture. Then, we model and evaluate Cortex A7, A9, and A15 cores using the gem5 simulation framework coupled to McPAT fo...

Chip multi-processor scalability for single-threaded applications

ACM SIGARCH …, 2005

The exponential increase in uniprocessor performance has begun to slow. Designers have been unable to scale performance while managing thermal, power, and electrical effects. Furthermore, design complexity limits the size of monolithic processors that can be designed while keeping costs reasonable. Industry has responded by moving toward chip multi-processor architectures (CMP). These architectures are composed from replicated processors utilizing the die area afforded by newer design processes. While this approach mitigates the issues with design complexity, power, and electrical effects, it does nothing to directly improve the performance of contemporary or future single-threaded applications.

Chip multiprocessors with speculative multithreading: design for performance and energy efficiency

2004

While Chip Multiprocessors (CMP) with Speculative Multithreading (SM) support have been gaining momentum, experienced processor designers in industry have reservations about their practical implementation. SM CMPs must exploit multiple sources of speculative task-level parallelism, if they want to achieve enough performance improvement for non-numerical applications. Additionally, it is felt that SM is too energy-inefficient to compete against conventional superscalars. This thesis challenges for the first time the commonly-held view that SM consumes excessive energy. It shows a CMP with SM support that is not only faster but also more energy efficient than a state-of-the-art wide-issue superscalar. This is demonstrated with a new energy-efficient CMP micro-architecture. To achieve these results, this thesis is also the first one to propose microarchitectural mechanisms that, taken together, fundamentally enable fast SM with out-of-order spawn in a CMP. These simple mechanisms are: Splitting Timestamp Intervals, the Immediate Successor List, and Dynamic Task Merging. To evaluate them, we develop a SM compiler with and without out-of-order spawn. In addition, the thesis identifies the sources of energy consumption in SM, and proposes energy-centric optimizations that mitigate them. Experiments with the SpecInt 2000 codes show that a CMP with 4 3-issue cores and support for SM delivers a speedup of 1.27 over a 3-issue superscalar. The SM CMP is even faster than a 6-issue superscalar at the same frequency, and consumes only 85% of its power. In fact, for the same average power in both chips, the SM CMP is 1.13 times faster than the 6-issue superscalar on average. I want to thank my advisor Josep Torrellas for letting me work in the IACOMA group. I must give special thanks to James Tuck, Luis Ceze, Karin Strauss, Wei Liu, and Smruti Sarangi for their help to develop this thesis. Without them, my graduation would have been delayed and the whole process would have been more painful and boring. I want to thank all the other members in the IACOMA group (current and past

Slight Heterogeneity in Multi-core Architecture: An Experimental & Comparative Study

There is a growing consensus that heterogeneous multicores are the future of CPUs. These processors would be composed of cores that are specifically adapted or tuned to particular types of applications and use cases, thereby increasing performance. The move from homogeneous to heterogeneous multicores causes the design space to explode, however. An architect of a heterogeneous processor must make design decisions per processor core rather than once for the entire processor as before. Currently, there are no methods for handling this design complexity to yield a processor that performs well for real workloads. As a step forward, we propose weak heterogeneity. A weakly heterogeneous processor is one whose cores are different, but not significantly so. The cores share an IS A and major microarchitectural features, differing only in minor details. Limiting the design space in this way allows us to explore the heterogeneous space without becoming overwhelmed by its s ize. We show preliminary results suggesting that a design space so constrained still has interesting trade-offs among performance, power consumption, and area.

A Fast System-Level Design Methodology for Heterogeneous Multi-Core Processors Using Emerging Technologies

A fast and efficient system-level design methodology is developed and validated to evaluate and optimize processors using emerging technologies at the early design stage. It includes an updated empirical cycle per instruction (CPI) model, a hierarchical memory model, and several multi-level interconnection network models. Multiple device- and system-level design parameters are simultaneously optimized to maximize the chip throughput for a given device technology and an architecture family under certain power, thermal and die size budgets. In the single-core processor analysis, a high-performance (HP) 25 mm2 FinFET processor can provide 26% more throughput than its planar HP CMOS counterpart, and a low-power (LP) TFET processor can offer more than 2X improvement in throughput compared with its LP FinFET counterpart at the 16nm technology node. For various technology nodes, an accurate power-law relation is observed between the throughput and the die size area for a relatively small processor. In the multi-core processor analyses, multiple device- and system-level design parameters are obtained for both HP and LP applications with symmetric and asymmetric configurations. For a heterogeneous CMOS-TFET multi-core processor, about 45% throughput improvement and 50% energy reduction are observed compared with a FinFET processor at a 5W power budget.