Franco-british N+ N meeting on Data-Parallel Languages and Compilers for Portable Parallel Computing (original) (raw)

P3L: A structured high-level parallel language, and its structured support

Concurrency: Practice and Experience, 1995

This paper presents a parallel programming methodology that ensures easy programming, e ciency, and portability of programs to di erent m a c hines belonging to the class of the generalpurpose, distributed memory, MIMD architectures. The methodology is based on the de nition of a new, high-level, explicitly parallel language, called P 3 L, and of a set of static tools that automatically adapt the program features for each target architecture. P 3 L does not require programmers to specify process activations, the actual parallelism degree, scheduling, or interprocess communications, i.e. all those features that need to be adjusted to harness each speci c target machine. Parallelism is, on the other hand, expressed in a structured and qualitative w ay, b y hierarchical composition of a restricted set of language constructs, corresponding to those forms of parallelism that are frequently encountered in parallel applications, and that can e ciently be implemented. The e cient portability o f P 3 L applications is guaranteed by the compiler along with the novel structure of the support. The compiler automatically adapts the program features for each speci c architecture, accessing the costs (in terms of performance) of the low-level mechanisms exported by t h e a r c hitecture itself. In our methodology, these costs, along with other features of the architecture, are viewed through an abstract machine, whose mechanism interface is used by the compiler to produce the nal object code.

Languages for vector and parallel processors

Computer Physics Communications, 1982

This paper first considers the major developments which have occurred in the design of high level languages for sequential machines. These developments illustrate how languages which were independent of the hardware eventually evolved. Two major types of language for vector and parallel processors have evolved, namely, detection of parallelism languages and expression of machine parallelism languages. The disadvantages and advantages of each type of language are examined. A third type of language is also considered which reflects neither the compiler's detection mechanism nor the underlying hardware. The syntax of this language enables the parallel nature of a problem to be expressed directly. The language is thus appropriate for both vector and array processors.

Compilation Techniques for Multimedia Processors

2000

The huge processing power needed by multimedia applications has led to multimedia extensions in the instruction set of microprocessors which exploit subword parallelism. Examples of these extended instruction sets are the Visual Instruction Set of the UltraSPARC processor, the AltiVec instruction set of the PowerPC processor, the MMX and ISS extensions of the Pentium processors, and the MAX-2 instruction set of the HP PA-RISC processor. Currently, these extensions can only be used by programs written in assembly language, through system libraries or by calling specialized macros in a high-level language. Therefore, these instructions are not used by most applications. We propose two code generation techniques to produce native code using these multimedia extensions for programs written in a high-level language: classical vectorization and vectorization by unrolling. Vectorization by unrolling is simpler than classical vectorization since data dependence analysis is reduced to acyclic control flow graph analysis. Furthermore, we address the problem of unaligned memory accesses. This can be handled by both static analysis and dynamic runtime checking. Preliminary experimental results for a code generator for the UltraSPARC VIS instruction set show that speedups of up to a factor of 4.8 are possible, and that vectorization by unrolling is much simpler but as effective as classical vectorization.

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-Parallelism

2008

Most efficient implementations of parallel logic programming rely on complex low-level machinery which is arguably difficult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallelism. We handle a significant portion of the parallel implementation at the Prolog level with the help of a comparatively small number of concurrency-related primitives which take care of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modifications to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary experiments show that the performance sacrificed is reasonable, although granularity control is required in some cases. Also, we observe that the availability of unrestricted parallelism contributes to better observed speedups.

Two Alternative Implementations of Automatic Parallelisation

This paper is a description of the recent parallelising compilers from our group at the University of Glasgow. Our group is part of the Computer Vision and Graphics research group and we have for some years been developing array compilers because we think these are a good tool both for expressing graphics algorithms and for exploiting the parallelism that computer vision applications require. We shall describe the implementation of two different languages on two different platforms and we shall compare the performance of these with reference C implementations running on the same platforms. Finally we shall draw conclusions both about the viability of the array language approach as compared to other approaches used in the challenge and also about the strengths and weaknesses of the two, very different, processor architectures we used.