Advanced program restructuring for high-performance computers with Polaris (original) (raw)

Development of large scale high performance applications with a parallelizing compiler

A bstract: -High level environment such as High Performance Fortran (HPF) supporting the development of parallel applications and porting of legacy codes to parallel architectures have not yet gained a broad acceptance and diffusion. Common objections claim difficulty of performance tuning, limitation of its application to regular, data parallel computations, and lack of robustness of parallelizing HPF compilers in handling large sized codes.

Refactorings for Fortran and high-performance computing

Proceedings of the second international workshop on Software engineering for high performance computing system applications, 2005

Not since the advent of the integrated development environment has a development tool had the impact on programmer productivity that refactoring tools have had for objectoriented developers. However, at the present time, such tools do not exist for high-performance languages such as C and Fortran; moreover, refactorings specific to highperformance and parallel computing have not yet been adequately examined. We observe that many common refactorings for object-oriented systems have clear analogs in procedural Fortran. The Fortran language itself and the introduction of object orientation in Fortran 2003 give rise to several additional refactorings. Moreover, we conjecture that many hand optimizations common in supercomputer programming can be automated by a refactoring engine but deferred until build time in order to preserve the maintainability of the original code base. Finally, we introduce Photran, an integrated development environment that will be used to implement these transformations, and discuss the impact of such a tool on legacy code reengineering.

Automatic parallelization of fortran programs in the presence of procedure calls

Lecture Notes in Computer Science, 1986

Scientific programs must be transformed and adapted to supercomputers to be executed efficiently. Automatic parallelization has been a very active research field for the past few years and effective restructuring compilers have been developed. Although CALL statements are not properly handled by current automatic tools, it has recently been shown that parallel execution of subroutines provides good speedups on multiprocessor machines.

Compiler transformations to generate reentrant C programs to assist software parallelization

2009

As we move through the multi-core era into the many-core era it becomes obvious that thread-based programming is here to stay. This trend in the development of general purpose hardware is augmented by the fact that while writing sequential programs is considered a non-trivial task, writing parallel applications to take advantage of the advances in the number of cores in a processor severely complicates the process. Writing parallel applications requires programs and functions to be reentrant. Therefore, we cannot use globals and statics. However, globals and statics are useful in certain contexts. Globals allow an easy programming mechanism to share data between several functions. Statics provide the only mechanism of data hiding in C for variables that are global in scope. Writing parallel programs restricts users from using globals and statics in their programs, as doing so would make the program non-reentrant. Moreover, there is a large existing legacy code base of sequential programs that are non-reentrant, since they rely on statics and globals. Several of these sequential programs display significant amounts of data parallelism by operating on independent chunks of input data, and therefore can be easily converted into parallel versions to exploit multi-core processors. Indeed, several such programs have been manually converted into parallel versions. However, manually eliminating all globals and statics to make the program reentrant is tedious, time-consuming, and error-prone. In this paper we describe a system to provide a semi-automated mechanism for users to still be able to use statics and globals in their programs, and to let the compiler automatically convert them into their semantically-equivalent reentrant versions enabling their parallelization later.

Compiler technology for parallel scientific computation

1994

There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software.

Compiler transformations for high-performance computing

ACM Computing Surveys, 1994

In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis. This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application. Programmers wishing to enhance the performance of their code can use this survey...

A MATLAB Compiler and Restructurer for the Development of Scientific Libraries and Applications

The development of efficient numerical programs and library routines for high-performance parallel computers is a complex task requiring not only an understanding of the algorithms to be implemented, but also detailed knowledge of the target machine and the software environment. In this paper, we describe a programming environment that can utilize such knowledge for the development of high-performance numerical programs and libraries. This environment uses an existing high-level array language (MATLAB) as source language and performs static, dynamic, and interactive analysis to generate Fortran 90 programs with directives for parallelism. It includes capabilities for interactive and automatic transformations at both the operation-level and the functionalor algorithm-level. Preliminary experiments on a serial machine, comparing interpreted MATLAB programs with their compiled versions, show that the compiled programs performed almost three times faster. 1 Introduction The development ...

Automatic loop parallelization via compiler guided refactoring

2011

For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler's ability to generate loop-parallel code. We use this compilation system to modify two sequential benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should be combined with target-specific optimizations. Furthermore, comparing the first benchmark to hand-parallelized, hand-optimized pthreads and OpenMP versions, we find that code generated using our approach typically outperforms the pthreads code (within 93-339%). It also performs competitively against the OpenMP code (within 75-111%). The second benchmark outperforms hand-parallelized and optimized OpenMP code (within 109-242%).

Intermediate Code Generation for Portable Scalable, Compilers. Architecture Independent Data Parallelism: The Preliminaries

1993

This paper introduces the goals of the Portable, Scalable, Architecture Independent (PSI) Compiler Project for Data Parallel Languages at the University of Missouri-Rolla. A goal of this project is to produce a subcompiler for data parallel scientific programming languages such as HPF(High Performance Fortran) where the input grammar is translated to a three-address code intermediate language. Ultimately we plan to integrate our work into automated synthesis systems for scientific programming because we feel that it should not be necessary to learn complicated programming techniques to use multiprocessor computers or networks of computers effectively. This paper shows how to compile a data parallel language to an arbitrary multiprocessor topology or network of CPUs given the number of processors, length of vector registers, and total number of components in an array assuming a message passing, distributed memory paradigm of send and receive. We emphasize that this paradigm is not on...