Future generation supercomputers I (original) (raw)

A New, Architectural Paradigm for High-performance Computing

Scalable Computing: Practice and Experience, 2001

At first thought, one may not realize the need for a special issue of Parallel and Distributed Computing Practices focused on Cluster Computing when our community has already invested years of research and spent a wealth of resources on traditional Big Iron supercomputers, for instance, the SGI/Cray Origin and IBM SP. Academia, industry, and national laboratories are still using, and for the foreseeable future, will continue to use, supercomputers to solve both grand-challenge scale and high-throughput applications. ...

Architecture, algorithms and applications for future generation supercomputers

1996

In this paper, we outline a hierarchical architecture for machines capable of over 100 teraOps in a 10 year time-frame. The motivating factors f o r the design are technological feasibility and economic viability. The envisioned architecture can be built largely from commodity components. The development costs of the machine will therefore be shared by the market. To obtain sustained performance from the machine, we propose a heterogeneous programming environment for the machine. The programming environment optimally uses the power of the hierarchy. Programming models for the stronger machine models existing at the lower levels are tuned f o r ease of programming. Higher levels of the hierarchy place progressively greater emphasis o n locality of data reference. The envisioned machine architecture requires new algorithm design methodologies. W e propose to develop hierarchical parallel algorithms and scalability metrics f o r evaluating such algorithms. W e identify three important application areas: large scale numerical simulations, problems in particle dynamics and boundary element methods, and emerging large-scale applications such as data-mining. W e briefly outline the process of hierarchical algorithm design for each of these application areas.

Network design considerations for exascale supercomputers

2012

We consider the network design optimization for the Exascale class supercomputers by altering the widely analyzed and implemented torus networks. Our alteration scheme involves interlacing the torus networks with bypass links of lengths 6 hops, 9 hops, 12 hops, and mixed 6 and 12 hops. These bypass links are optimal resulting from exhaustive search of massive possibilities. Our case study is constructed by strategically coupling 288 racks of 6 × 6 × 36 nodes to a full system with 72 × 72 × 72 nodes. The peak performance of such a system is 0.56 Exa-flops when CPU-GPU complexes are adopted as a node module capable of 1.5 Tflops. Our design optimizes, simultaneously, the system performance, performancecost ratio, and power efficiency. The network diameter and the average node-to-node network distance, regarded as the performance metrics, got reduced, from the original 3D torus network, by 83.3% and 80.4%, respectively. Similarly, the performance-cost ratio and power efficiency are also increased 1.43 and 4.44 times, respectively.

Deploying a Top-100 Supercomputer for Large Parallel Workloads

Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019

Niagara is currently the fastest supercomputer accessible to academics in Canada. It was deployed at the beginning of 2018 and has been serving the research community ever since. This homogeneous 60,000-core cluster, owned by the University of Toronto and operated by SciNet, was intended to enable large parallel jobs and has a measured performance of 3.02 petaflops, debuting at #53 in the June 2018 TOP500 list. It was designed to optimize throughput of a range of scientific codes running at scale, energy efficiency, and network and storage performance and capacity. It replaced two systems that SciNet operated for over 8 years, the Tightly Coupled System (TCS) and the General Purpose Cluster (GPC) [13]. In this paper we describe the transition process from these two systems, the procurement and deployment processes, as well as the unique features that make Niagara a one-of-a-kind machine in Canada. CCS CONCEPTS • Social and professional topics → Hardware selection; Computing equipment management; Systems planning; Systems analysis and design; Systems development; • Networks → Data center networks.

Application Scalability and Communication Signatures on Leading Supercomputing Platforms

2006

After a decade where supercomputing capability was dominated by the rapid pace of improvements to CPU clock frequency, the performance of next generation of supercomputing designs are increasingly differentiated by varying interconnect designs and level of integration. Understanding their performance tradeoffs is critical for computational scientists, architectural designers, and system procurements. Our paper examines the performance of the very latest generation of superscalar-based supercomputing platforms for a broad range of full-scale scientific applications. (we are the first to compare this set of architectures). Our results show that smaller sized problems are rapidly dominated by latency at larger concurrencies on all systems except for the IBM BG/L. This may have important ramifications for future systems where it is expected that smaller memory sizes per peak flop on the system will force users to large concurrencies for even the smaller problem sizes.

Unlocking the performance of the BlueGene/L supercomputer

2004

Abstract The BlueGene/L supercomputer is expected to deliver new levels of application performance by providing a combination of good single-node computational performance and high scalability. To achieve good single-node performance, the BlueGene/L design includes a special dual floating-point unit on each processor and the ability to use two processors per node. BlueGene/L also includes both a torus and a tree network to achieve high scalability.

Making a Case for Efficient Supercomputing

Queue, 2003

A supercomputer evokes images of “big iron“ and speed; it is the Formula 1 racecar of computing. As we venture forth into the new millennium, however, I argue that efficiency, reliability, and availability will become the dominant issues by the end of this decade, not only for supercomputing, but also for computing in general.

Paving the Road towards Pre-Exascale Supercomputing

2014

Supercomputing at scale has become the decisive challenge for users, providers and vendors of leading supercomputer systems. On next-generation systems, approaching exascale by the end of the decade, we will be confronted with millions of cores, and the need of massive parallelism. Beyond aggregating ever larger compute performance also the ability to hold and efficiently process drastically increasing amounts of data will be key to enable future leading research facilities for computational science. We report in this article on the evolving supercomputing infrastructure at Julich Supercomputing Centre (JSC), research and development activities on future HPC technologies and architectures as well as on the computational science research and collaboration with science areas which will require exascale supercomputing in the future.

Designing Computational Clusters for Performance and Power

Advances in Computers, 2007

Power consumption in computational clusters has reached critical levels. High-end cluster performance improves exponentially while the power consumed and heat dissipated increase operational costs and failure rates. Yet, the demand for more powerful machines continues to grow. In this chapter, we motivate the need to reconsider the traditional performance-at-any-cost cluster design approach. We propose designs where power and performance are considered critical constraints. We describe power-aware and low power techniques to reduce the power profiles of parallel applications and mitigate the impact on performance.

Large-Memory Nodes for Energy Efficient High-Performance Computing

Proceedings of the Second International Symposium on Memory Systems, 2016

Energy consumption is by far the most important contributor to HPC cluster operational costs, and it accounts for a significant share of the total cost of ownership. Advanced energy-saving techniques in HPC components have received significant research and development effort, but a simple measure that can dramatically reduce energy consumption is often overlooked. We show that, in capacity computing, where many small to medium-sized jobs have to be solved at the lowest cost, a practical energy-saving approach is to scale-in the application on large-memory nodes. We evaluate scaling-in; i.e. decreasing the number of application processes and compute nodes (servers) to solve a fixedsized problem, using a set of HPC applications running in a production system. Using standard-memory nodes, we obtain average energy savings of 36%, already a huge figure. We show that the main source of these energy savings is a decrease in the node-hours (node hours = #nodes × exe time), which is a consequence of the more efficient use of hardware resources. Scaling-in is limited by the per-node memory capacity. We therefore consider using large-memory nodes to enable a greater degree of scaling-in. We show that the additional energy savings, of up to 52%, mean that in many cases the investment in upgrading the hardware would be recovered in a typical system lifetime of less than five years. CCS Concepts •Computer systems organization → Distributed architectures; •Hardware → Power and energy; ACM ISBN X-XXXXX-XX-X/XX/XX.