Energy-Efficient Computing for Extreme-Scale Science (original) (raw)

High-performance, power-aware distributed computing for scientific applications

Computer, 2000

C omputer models deepen our understanding of complex phenomena and indirectly improve our quality of life. Biological simulations of DNA sequencing and protein folding advance healthcare and drug discovery. Mathematical simulations of world economies guide political policy. Physical fluid-flow simulations of global climate allow more accurate forecasting and life-saving weather alerts. Nanoscale simulations enhance underlying computing technologies.

Evaluating Architectures for Application-Specific Parallel Scientific Computing Systems

2008

In this work, we examine the computational efficiency of scientific applications on three high-performancecomputing systems based on processors of varying degrees of specialization: an x86 server processor, the AMD Opteron; a more specialized System-on-Chip solution, the BlueGene/L and BlueGene/P; and a configurable embedded core, the Tensilica Xtensa. We use the atmospheric component of the global Community Atmospheric Model to motivate our study by defining a problem that requires exascale-class computing performance currently beyond the capabilities of existing systems. Significant advances in power-efficiency are necessary to make such a system practical to field.

Designing Computational Clusters for Performance and Power

Advances in Computers, 2007

Power consumption in computational clusters has reached critical levels. High-end cluster performance improves exponentially while the power consumed and heat dissipated increase operational costs and failure rates. Yet, the demand for more powerful machines continues to grow. In this chapter, we motivate the need to reconsider the traditional performance-at-any-cost cluster design approach. We propose designs where power and performance are considered critical constraints. We describe power-aware and low power techniques to reduce the power profiles of parallel applications and mitigate the impact on performance. Michigan State University) and a series of benchmarks have been built and installed on Argus. Following our augmentation, Argus resembles a standard Linux-based cluster running existing software packages and compiling new applications.

Future Computing Platforms for Science in a Power Constrained Era

Journal of Physics: Conference Series, 2015

Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. We evaluate the potential for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).

Power-Efficient Computing: Experiences from the COSA Project

Scientific Programming

Energy consumption is today one of the most relevant issues in operating HPC systems for scientific applications. The use of unconventional computing systems is therefore of great interest for several scientific communities looking for a better tradeoff between time-to-solution and energy-to-solution. In this context, the performance assessment of processors with a high ratio of performance per watt is necessary to understand how to realize energy-efficient computing systems for scientific applications, using this class of processors. Computing On SOC Architecture (COSA) is a three-year project (2015–2017) funded by the Scientific Commission V of the Italian Institute for Nuclear Physics (INFN), which aims to investigate the performance and the total cost of ownership offered by computing systems based on commodity low-power Systems on Chip (SoCs) and high energy-efficient systems based on GP-GPUs. In this work, we present the results of the project analyzing the performance of seve...

Energy Efficiency For Extreme-Scale Computer Architecture

The era of single core processors is gone now. We are now using processors with upto 10 cores now days and we are moving towards integration levels of 1,000 core processor chips. The powerful obstacles which we are facing are power and energy consumption. To construct a multi core processor chip like this, there is a need to rebuild the whole compute stack from the ground up for energy and power efficiency. To achieve Extreme Scale Computing milestone, this is very important that we control power consumption. This is also very important that we operate the processor at lower voltage because this is the point of maximum energy efficiency. Unfortunately, in such type of environment, we have to handle huge process variation. However, it is very important to design voltage regulation efficiently, so that each region of the processor chip can operate at the most effective voltage and frequency levels. At the level of architecture, we need simple cores which are organized in a hierarchy of clusters. Furthermore, we also require techniques which can lessen the spillage of on-chip recollections and which can bring down the voltage protect groups of rationale. At long last, we additionally need to limit information development by the utilization of both equipment and programming systems. Notwithstanding we can get the required vitality efficiencies with an efficient approach that cuts over numerous layers of the processing stack.

Design and Analysis of a 32-bit Embedded High-Performance Cluster Optimized for Energy and Performance

2014 Hardware-Software Co-Design for High Performance Computing, 2014

A growing number of supercomputers are being built using processors with low-power embedded ancestry, rather than traditional high-performance cores. In order to evaluate this approach we investigate the energy and performance tradeoffs found with ten different 32-bit ARM development boards while running the HPL Linpack and STREAM benchmarks. Based on these results (and other practical concerns) we chose the Raspberry Pi as a basis for a power-aware embedded cluster computing testbed. Each node of the cluster is instrumented with power measurement circuitry so that detailed cluster-wide power measurements can be obtained, enabling power / performance co-design experiments. While our cluster lags recent x86 machines in performance, the power, visualization, and thermal features make it an excellent low-cost platform for education and experimentation. * By default the L2 on the Pi belongs to the GPU, but Raspbian reconfigures it for CPU use.

The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers - CF '06, 2006

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the forthcoming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on the Cell full system simulator. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.