Reinventing explicit parallel programming for improved engineering of high performance computing software (original) (raw)

Abstractions for Parallelism: Patterns, Performance and Correctness

cs.manchester.ac.uk

Despite rapid advances in parallel hardware performance, the full potential of processing power is not being exploited in the software community for one clear reason: the difficulty in designing efficient and effective parallel applications. Identifying sub-tasks within the application, designing parallel algorithms, and balancing load among the processing units has been a daunting task for novice programmers, and even the experienced programmers are often trapped with design decisions that underachieve in potential peak performance. Design patterns have been used as a notation to capture how experts in a given domain think about and approach their work. Over the last decade there have been several approaches in identifying common patterns that are repeatedly used in parallel software design process. Documentation of these design patterns helps the programmers by providing definition, solution and guidelines for common parallelization problems. A convenient way to further raise the level of abstraction and make it easier for programmers to write legible code is the philosophy of 'Separation of Concerns'. This separation is achieved by Aspect Oriented Programming (AOP) paradigm by allowing programmers to specify the concerns in an independent manner and letting the compiler 'weave' (AOP terminology for unification of modules) them at compile time. However, abstraction by its very nature often produces unoptimized code as it frames the solution of a problem without much thought to underlying machine architecture. Indeed, in the current phase of multicore era, where chip manufacturers are continuously experimenting with processor architectures, an optimization on one architecture might not yield any benefit on another from a different chip manufacturer. Using the auto-tuner one can automatically explore the optimization space for a particular computational kernel on a given processor architecture. The last relevant aspect of concern in this project would be the formal specification and verification of properties concerning parallel programs. It's well known fact that parallel programs are particularly prone to insidious defects such as deadlocks and race conditions due to shared variables and locking. Using tools from formal verification it is however possible to guarantee certain safety properties (such as deadlock and data race avoidance) while refining successive abstractions to code level. The interplay of abstractions, auto-tuning and correctness in the context of parallel software development will be considered in this project report.

Parallel programming with message passing and directives

Computing in Science and Engineering, 2001

arallel application developers today face the problem of how to integrate the dominant parallel processing models into one source code. Most high-performance systems use the Distributed Memory Parallel (DMP) and Shared Memory Parallel (SMP; also known as Symmetric MultiProcessor) models, and many applications can benefit from support for multiple parallelism modes. Here we show how to integrate both modes into high-performance parallel applications. These applications have three primary goals:

Object-oriented analysis and design of the Message Passing Interface

Concurrency and Computation: Practice and Experience, 2001

The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application programmer interfaces for objectoriented languages, and message passing systems. Recognition of \Design Patterns" within MPI is an important discernment o f t h i s w ork. A further contribution is a comparative discussion of the design and evolution of three actual object-oriented designs for the Message Passing Interface (MPI-1) application programmer interface (API), two o f which h a ve in uenced the standardization of C++ explicit parallel programming with MPI-2, and which strongly indicate the value of a priori object-oriented design and analysis of such APIs. Knowledge of design patterns is assumed herein. Discussion provided here includes systems developed at Mississippi State University (MPI++), the University of Notre Dame (OOMPI), and the merger of these systems that results in a standard binding within the MPI-2 standard. Commentary concerning additional opportunities for further object-oriented analysis and design of message passing systems and APIs, such a s MPI-2 and MPI/RT are mentioned in conclusion. Connection of modern software design and engineering principles to High Performance Computing programming approach e s i s a n e w a n d i m p o r t a n t further contribution of this work.

P3L: A structured high-level parallel language, and its structured support

Concurrency: Practice and Experience, 1995

This paper presents a parallel programming methodology that ensures easy programming, e ciency, and portability of programs to di erent m a c hines belonging to the class of the generalpurpose, distributed memory, MIMD architectures. The methodology is based on the de nition of a new, high-level, explicitly parallel language, called P 3 L, and of a set of static tools that automatically adapt the program features for each target architecture. P 3 L does not require programmers to specify process activations, the actual parallelism degree, scheduling, or interprocess communications, i.e. all those features that need to be adjusted to harness each speci c target machine. Parallelism is, on the other hand, expressed in a structured and qualitative w ay, b y hierarchical composition of a restricted set of language constructs, corresponding to those forms of parallelism that are frequently encountered in parallel applications, and that can e ciently be implemented. The e cient portability o f P 3 L applications is guaranteed by the compiler along with the novel structure of the support. The compiler automatically adapts the program features for each speci c architecture, accessing the costs (in terms of performance) of the low-level mechanisms exported by t h e a r c hitecture itself. In our methodology, these costs, along with other features of the architecture, are viewed through an abstract machine, whose mechanism interface is used by the compiler to produce the nal object code.

Reflective Parallel Programming Extensible and High-Level Control of Runtime, Compiler, and Application Interaction

Thread support in most languages is opaque and low- level. Primitives like wait and signal do not allow users to determine the relative ordering of statements in differ- ent threads in advance. In this paper, we extend the reflection and meta- programming facilities of object-oriented languages to cover parallel program schedules. The user can then access objects representing the extant threads or other parallel tasks. These objects can be used to modify or query happens before relations, locks, and other high- level scheduling information. These high-level models enable users to design their own parallel abstractions, vi- sualizers, safety checks, and other tools in ways that are not possible today. We discuss one implementation of this technique, the intervals library, and show how the presence of a first- class, queryable program schedule allows us to support a flexible data race protection scheme. The scheme sup- ports both static and dynamic checks and also permits users to def...

An object-passing model for parallel programming

Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003, 2003

This paper introduces an object-passing model for parallel and distributed application development. Object passing provides the object-oriented application developer with powerful yet simple methods to distribute and exchange data and logic (objects) among processes. The model extends message passing, while exploiting the advantages of the object-oriented paradigm. In addition, the model provides a portable framework for executing applications across multiple platforms, thus effectively exploiting available resources to gain more processing power. A number of advantageous aspects of adopting object passing are discussed, in addition to highlighting the differences between message passing, represented by MPI, and object passing. Another advantage is the model's suitability for heterogeneous systems. When implemented with a portable language like Java, it can support parallel and distributed applications spanning a collection of heterogeneous platforms. This form of execution will eventually allow for full utilization of available resources for any given application written using this model.

Describing the Semantics of Parallel Programming Languages using Shared Data Abstractions

Programming language semantics can be de ned in a variety of ways, one of which is to use an information structure model based on abstract data types. We have previously used this technique to provide de nitions of semantic aspects of sequential languages and have demonstrated how it lends itself to the automatic generation of prototype language implementations from the formal model. However, di culties arise when any attempt is made to describe a parallel programming language using this technique, because of the need to regulate access to abstract data types in a parallel environment. This motivates the consideration of alternative techniques for the de nition of the information structures used in the underlying model. The notion of shared data abstractions provides such an alternative and this paper explores some of the issues in using shared data abstractions in the de nition of the semantics of parallel programming languages.

Pal: towards a new approach to high level parallel programming

2006

We present a new programming model based on user annotations that can be used to transform plain Java programs into suitable parallel code that can be run on workstation clusters, networks and grids. The only user responsibility consists in decorating the methods that will eventually be executed in parallel with standard Java 1.5 annotations. Then these annotations are automatically processed and parallel byte code is derived. When the annotated program is started, it automatically retrieves the information about the executing platform and evaluates the information specified inside the annotations to transform the byte-code into a semantically equivalent multithreaded/multitask version. The results returned by the annotated methods, when invoked, are futures with a waitby-necessity semantics. A PAL prototype has been implemented in Java, using JJPF as Parallel Framework. The experiments made with the prototype are encouraging: the design of parallel applications has been greatly simplified and the performances obtained are the same of an application directly written in JJPF.

Experiences toward an Object-Oriented Approach to Structured Parallel Programming

1995

In parallel programming, communication patterns are rarely arbitrary and unstructured. Instead, parallel applications tend to employ predetermined patterns of communication between their components. If the most commonly used patterns-such as pipelines, farms and trees-are identified (both in terms of their components and their communication), an environment can make them available as high-level abstractions for use in writing applications. This can yield a structured approach to parallel programming. The paper shows how this structured approach can be accommodated within an object-oriented language. A class library provides the most commonly used patterns and programmers can exploit inheritance to define new patterns. Several examples illustrate the approach and show that it can be efficiently implemented.

A Component Model for High Level and Efficient Parallel Programming on Distributed Architectures

iadis.net

The computer science community has claimed for parallel languages and models with a higher level of abstraction and modularity, without performance penalties, that could be used in conjunction with advanced software engineering techniques, and that are suitable to work with large-scale programs. This paper presents general aspects about the #1 parallel programming model and its associated programming environment, designed to address these issues.