Core architecture optimization for heterogeneous chip multiprocessors (original) (raw)

Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures

2019

Energy-efficiency has become a major challenge in modern computer systems. To address this challenge, candidate systems increasingly integrate heterogeneous cores in order to satisfy diverse computation requirements by selecting cores with suitable features. In particular, single-ISA heterogeneous multicore processors such as ARM big.LITTLE have become very attractive since they offer good opportunities in terms of performance and power consumption trade-off. While existing works already showed that this feature can improve system energy-efficiency, further gains are possible by generalizing the principle to higher levels of heterogeneity. The present paper aims to explore these gains by considering single-ISA heterogeneous multicore architectures including three different types of cores. For this purpose, we use the Samsung Exynos Octa 5422 chip as baseline architecture. Then, we model and evaluate Cortex A7, A9, and A15 cores using the gem5 simulation framework coupled to McPAT fo...

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

2003

This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an application's execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements.

Slight Heterogeneity in Multi-core Architecture: An Experimental & Comparative Study

There is a growing consensus that heterogeneous multicores are the future of CPUs. These processors would be composed of cores that are specifically adapted or tuned to particular types of applications and use cases, thereby increasing performance. The move from homogeneous to heterogeneous multicores causes the design space to explode, however. An architect of a heterogeneous processor must make design decisions per processor core rather than once for the entire processor as before. Currently, there are no methods for handling this design complexity to yield a processor that performs well for real workloads. As a step forward, we propose weak heterogeneity. A weakly heterogeneous processor is one whose cores are different, but not significantly so. The cores share an IS A and major microarchitectural features, differing only in minor details. Limiting the design space in this way allows us to explore the heterogeneous space without becoming overwhelmed by its s ize. We show preliminary results suggesting that a design space so constrained still has interesting trade-offs among performance, power consumption, and area.

Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures

IEEE Computer Architecture Letters, 2003

This paper proposes a single-ISA heterogeneous multi-core architecture as a mechanism to reduce processor power dissipation. It assumes a single chip containing a diverse set of cores that target different performance levels and consume different levels of power. During an application's execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements. It describes an example architecture with five cores of varying performance and complexity. Initial results demonstrate a five-fold reduction in energy at a cost of only 25% performance.

A Fast System-Level Design Methodology for Heterogeneous Multi-Core Processors Using Emerging Technologies

A fast and efficient system-level design methodology is developed and validated to evaluate and optimize processors using emerging technologies at the early design stage. It includes an updated empirical cycle per instruction (CPI) model, a hierarchical memory model, and several multi-level interconnection network models. Multiple device- and system-level design parameters are simultaneously optimized to maximize the chip throughput for a given device technology and an architecture family under certain power, thermal and die size budgets. In the single-core processor analysis, a high-performance (HP) 25 mm2 FinFET processor can provide 26% more throughput than its planar HP CMOS counterpart, and a low-power (LP) TFET processor can offer more than 2X improvement in throughput compared with its LP FinFET counterpart at the 16nm technology node. For various technology nodes, an accurate power-law relation is observed between the throughput and the die size area for a relatively small processor. In the multi-core processor analyses, multiple device- and system-level design parameters are obtained for both HP and LP applications with symmetric and asymmetric configurations. For a heterogeneous CMOS-TFET multi-core processor, about 45% throughput improvement and 50% energy reduction are observed compared with a FinFET processor at a 5W power budget.

Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures

ACM Transactions on Embedded Computing Systems, 2012

Multicore architectures provide scalable performance with a lower hardware design effort than single core processors. Our paper presents a design methodology and an embedded multicore architecture, focusing on reducing the software design complexity and boosting the performance density. First, we analyze characteristics of the Task-Level Parallelism in modern multimedia workloads. These characteristics are used to formulate requirements for the programming model. Then, we translate the programming model requirements to an architecture specification, including a novel low-complexity implementation of cache coherence and a hardware synchronization unit. Our evaluation demonstrates that the novel coherence mechanism substantially simplifies hardware design, while reducing the performance by less than 18% relative to a complex snooping technique. Compared to a single processor core, the multicores have already proven to be more area-and energy-efficient. However, the multicore architectures in embedded systems still compete with highly efficient function-specific hardware accelerators. In this paper we identify five architectural methods to boost performance density of multicoresmicroarchitectural downscaling, asymmetric multicore architectures, multithreading, generic accelerators, and conjoining. Then, we present a novel methodology to explore multicore design spaces, including the architectural methods improving the performance density. The methodology is based on a complex formula computing performances of heterogeneous multicore systems. Using this design space exploration methodology for HD and QuadHD H.264 video decoding, we estimate that the required areas of multicores in CMOS 45 nm are 2.5 mm 2 and 8.6 mm 2 , respectively. These results suggest that heterogeneous multicores are cost-effective for embedded applications and can provide a good programmability support.

Power/Performance/Thermal Design-Space Exploration for Multicore Architectures

IEEE Transactions on Parallel and Distributed Systems, 2000

Multicore architectures have been ruling the recent microprocessor design trend. This is due to different reasons: better performance, thread-level parallelism bounds in modern applications, ILP diminishing returns, better thermal/power scaling (many small cores dissipate less than a large and complex one), and the ease and reuse of design. This paper presents a thorough evaluation of multicore architectures. The architecture that we target is composed of a configurable number of cores, a memory hierarchy consisting of private L1, shared/private L2, and a shared bus interconnect. We consider a benchmark set composed of several parallel shared memory applications. We explore the design space related to the number of cores, L2 cache size, and processor complexity, showing the behavior of the different configurations/applications with respect to performance, energy consumption, and temperature. Design trade-offs are analyzed, stressing the interdependency of the metrics and design factors. In particular, we evaluate several chip floorplans. Their power/thermal characteristics are analyzed, showing the importance of considering thermal effects at the architectural level to achieve the best design choice.

Heterogeneous Chip Multiprocessors: A Survey

maqayum.yolasite.com

As computer applications are becoming complex, large and versatile; the advent of Chip multiprocessor was ubiquitous. There are numerous researches going on about the core architectures within the chip. Heterogeneous Chip Multiprocessor (CMP) is leading the innovation. It is composed of cores of varying performance, and complexity. It gives better area to performance ratio, high throughput, and higher speed up and mitigates Amdahl's bottleneck to some extent. This paper surveys recent researches of various configuration issues related to Heterogeneous CMP in details. There are four major issues OS part-scheduling applications to different cores, configuration of cores, Amdahl's law utilization. Finally some recommendations are drawn from the study.