Guest Editorial: Embedded Multicore Systems and Applications (original) (raw)

Power and energy analysis on intel Single-Chip Cloud Computer system

2012 Proceedings of IEEE Southeastcon, 2012

Improving the computing performance of the multicore and many-core systems is one of the primary interests to computer architecture researchers currently. Message Passing Interface (MPI) and Multi-core techniques are converged to solve this problem. With performance enhancement, the power and energy consumption increase correspondingly. The Single-Chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. This paper proposed an approach to study the power and energy consumption on the 48-core SCC many-core system and realized the message passing on the SCC. First, we profile the execution time, voltage and current on each running set. Later, we calculated the power and energy consumption, and compared them with increasing number of cores, varying voltage and frequency levels. Finally, we reached a conclusion focus on its scalability and relationship between power/energy consumption and system performance in terms of execution time.

Energy aware execution environments and algorithms on low power multi-core architectures

2016

Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Energy consumption is a key aspect that conditions the proper functioning of nowadays data centers and high performance computing just like the launch of new services, due to its environmental negative impact and the increasing economic costs of energy. The energy efficiency of the applications used in these data centers could be improved, especially when systems’ utilization rate is low or moderate, or when targeting memory bounded applications. In this sense, energy proportionality stands for systems which power consumption is in line with the amount of work performed in each moment. As a response to these needs, the main objective of this project is to study, design, develop and analyze experimental solutions (models, programs, tools and techniques) aware of energy proportionality for scientific and engineering applications on low-power archi...

Evaluation of Low-Power Computing when Operating on Subsets of Multicore Processors

Journal of Signal Processing Systems, 2013

Given the accelerated growth in tablet devices, smartphones, and netbooks, designers are faced with serious challenges to meet the needs of mobility in terms of battery life and form factor. It is vital to investigate how to deliver the best mobile experience to users while ensuring adequate levels of performance. In this paper, we present a power management evaluation of multi-core processor systems by comparing thermal power, battery life, and performance when running different types of workloads under a limited number of cores. To show the potential gains from a system power management perspective, we have assessed a mobile platform featuring the Second Generation Intel Core i5 processor, and tested it on a wide selection of workloads and benchmarks. Experimental results show significant thermal power reduction (up to 40 %) in a variety of scenarios, while system performance was sustained in most cases but sacrificed in a few other uncommon situations.

Power and energy-aware processor scheduling

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering, 2011

Power consumption is a critical consideration in high computing systems. We propose a novel job scheduler that optimizes power and energy consumed by clusters when running parallel benchmarks with minimal impact on performance. We construct accurate models for estimating power consumption. These models are based on measurements of power consumption on benchmarks with different characteristics and on systems with processors using different micro-architectures. We show the power estimation models achieve less than 2% error versus actual measurements. We show a job scheduler can be enhanced to make it "power-aware" and to optimize power consumption of jobs with similar performance characteristics. The enhanced scheduler can estimate the power consumed by a particular job using the power estimation model, configure the nodes in the cluster via suitably adjusting processor frequency on each of the nodes to maximize performance, minimize power, or minimize energy with a predictable impact on power, energy and performance.

Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi

2017 IEEE High Performance Extreme Computing Conference (HPEC)

The emergence of power efficiency as a primary constraint in processor and system designs poses new challenges concerning power and energy awareness for numerical libraries and scientific applications. Power consumption also plays a major role in the design of data centers in particular for peta-and exascale systems. Understanding and improving the energy efficiency of numerical simulation becomes very crucial. We present a detailed study and investigation toward controlling power usage and exploring how different power caps affect the performance of numerical algorithms with different computational intensities, and determine the impact and correlation with performance of scientific applications. Our analyses is performed using a set of representatives kernels, as well as many highly used scientific benchmarks. We quantify a number of power and performance measurements, and draw observations and conclusions that can be viewed as a roadmap toward achieving energy efficiency computing algorithms.

Performance Analysis and Benchmarking of the Intel SCC

2011 IEEE International Conference on Cluster Computing, 2011

There has been a continuous change over the past years in CPU design and development towards both power-aware hardware architectures as well as many-core processors. The Intel Single-chip Cloud Computer (SCC) combines those two trends. It is an experimental prototype created by Intel Labs consisting of 48 Pentium cores. The SCC is a highly configurable many-core chip that provides unique opportunities to optimize run time, communication and memory access as well as power and energy consumption of parallel programs. The aim of this paper is to analyze and characterize the performance behavior of the chip nuder various power settings, mappings of processes to cores and memory controllers as well as different techniques for data exchange between cores through benchmarking. The results are verified and interpreted by the use of analytical models as well as benchmarking kernels and a scientific application. Conclusions drawn from the results of our benchmarks confirm our architecture-derived hypothesis that data exchange based on shared memory is slower compared to using a message passing scheme. Furthermore contrary to popular belief, lowest energy consumption is not achieved for the fastest execution time but rather for a medium frequency/voltage setting, depending on the program being executed. Moreover in order to improve the memory access behavior it is more beneficial to increase the clock frequency of both, mesh network and memory controllers, compared to just increasing the clock of one of the two entities. In general, the results of our investigations can be used to analyze the effect of power settings and architecture properties on the performance and energy consumption of parallel programs as well as assist in choosing appropriate settings for specific workloads. Hence, our findings serve as a guidance for developers on how to effectively use the architectural characteristics of the SCC.

A novel data partitioning algorithm for dynamic energy optimization on heterogeneous high‐performance computing platforms

Concurrency and Computation: Practice and Experience, 2020

Energy is one of the most important objectives for optimization on modern heterogeneous high-performance computing (HPC) platforms. The tight integration of multicore CPUs with accelerators such as graphical processing units (GPUs) and Xeon Phi coprocessors in these platforms presents several challenges to the optimization of multithreaded data-parallel applications for energy. In this work, the problem of optimization of data-parallel applications on heterogeneous HPC platforms for dynamic energy through workload distribution is formulated. We propose a workload partitioning algorithm to solve this problem. It employs load-imbalancing technique to determine the workload distribution minimizing the dynamic energy consumption of the parallel execution of an application. The inputs to the algorithm are discrete dynamic energy profiles of individual computing devices. The profiles are practically constructed using an approach that accurately models the energy consumption by execution of a hybrid scientific data-parallel application on a heterogeneous platform containing different computing devices such as CPU, GPU, and Xeon Phi. The proposed algorithm is experimentally analyzed using two multithreaded data-parallel applications, matrix multiplication and 2D fast Fourier transform. The load-imbalanced solutions provided by the algorithm achieve significant dynamic energy reductions for the two applications (in average by 130% and 44%, respectively) compared with the load-balanced solutions.

Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs

2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Power dissipation and energy consumption are becoming increasingly important architectural design constraints in different types of computers, from embedded systems to largescale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the best use of exponentially increasing numbers of transistors within the power and energy budgets. Intel SCC is an appealing option for future many-core architectures. In this paper, we use various scalable applications to quantitatively compare and analyze the performance, power consumption and energy efficiency of different cutting-edge platforms that differ in architectural build. These platforms include the Intel Single-Chip Cloud Computer (SCC) many-core, the Intel Core i7 general-purpose multi-core, the Intel Atom low-power processor, and the Nvidia ION2 GPGPU. Our results show that the GPGPU has outstanding results in performance, power consumption and energy efficiency for many applications, but it requires significant programming effort and is not general enough to show the same level of efficiency for all the applications. The "light-weight" many-core presents an opportunity for better performance per watt over the "heavy-weight" multi-core, although the multi-core is still very effective for some sophisticated applications. In addition, the low-power processor is not necessarily energy-efficient, since the runtime delay effect can be greater than the power savings.

Scheduling for Better Energy Efficiency on Many-Core Chips

Many-core chips are especially attractive for data center operators providing cloud computing service models. With the advance of many-core chips in such environments energy-conscious scheduling of independent processes or operating systems (OSes) is gaining importance. An important research question is how the scheduler of such a system should assign the cores to the OSes in order to achieve a better energy utilization. In this paper, we demonstrate that many-core chips offer new opportunities for extremely lightweight migration of independent processes (or OSes) running bare-metal on the many-core chip. We then show how this intra-chip migration can be utilized to achieve a better performance per watt ratio by implementing a hierarchical power-management scheme on top of dynamic voltage and frequency scaling (DVFS). We have implemented and tested the proposed techniques on the Intel Single Chip Cloud Computer (SCC). Combining migration with DVFS we achieve, on average, a 25–35% better performance per watt over a DVFS-only solution.