A Parallel Functional Language Compiler for Message-Passing Multicomputers (original) (raw)

A Simple Parallelising Compiler

The process of writing parallel programs is not as simple or common as traditional sequential programming. A programmer must have an in-depth knowledge of the program at hand, as well as the resources available when the program is executed. By designing a compiler that can automatically parallelise sequentially written programs, is obviously of great benefit. The current efforts concentrate mainly on fine grained parallelism. With the move towards using clusters of workstations as platforms for parallel processing, coarse grained parallelism is becoming a very important research issue. This paper reports on the development of a simple parallelising compiler exploiting course grained parallelism in sequential programs. In particular, the use of three structured algorithms used to divide the parallelisation process into manageable phases is presented.

A parallelizing compiler for multicore systems

Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems - SCOPES '14, 2014

This manuscript summarizes the main ideas introduced in [1]. We propose a compiler that automatically transforms a sequential application into a parallel counterpart for multicore processors. It is based on an intermediate representation, named KIR, which exposes multiple levels of parallelism and hides the complexity of the implementation details thanks to the domain-independent kernels (e.g., assignment, reduction). The effectiveness and performance of our approach, built on top of GCC, has been tested with a large variety of codes.

Compiling parallel programs by optimizing performance

The Journal of Supercomputing, 1988

This paper describes how Crystal, a language based on familiar mathematical notation and lambda calculus, addresses the issues of programmability and performance for parallel supercomputers. Some scientifc programmers and theoreticians may ask, "What is new about Crystal?" or "How is it different from existing functional languages?" The answers lie in its model of parallel computation and a theory of parallel program optimization, and we examine this in the text to follow. We illustrate the power of our approach with benchmarks of compiled parallel code from Crystal source. The target machines are hypercube multiprocessors with distributed memory, on which it is considered difficult for functional programs to achieve high efficiency.

Languages and compilers for parallel computing : 12th International Workshop, LCPC'99, La Jolla, CA, USA, August 4-6, 1999 : proceedings

2000

Java.- High Performance Numerical Computing in Java: Language and Compiler Issues.- Instruction Scheduling in the Presence of Java's Runtime Exceptions.- Dependence Analysis for Java.- Low-Level Transformations A.- Comprehensive Redundant Load Elimination for the IA-64 Architecture.- Minimum Register Instruction Scheduling: A New Approach for Dynamic Instruction Issue Processors.- Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling.- Data Distribution.- A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs.- Accurate Data and Context Management in Message-Passing Programs.- An Automatic Iteration/Data Distribution Method Based on Access Descriptors for DSMM.- High-Level Transformations.- Inter-array Data Regrouping.- Iteration Space Slicing for Locality.- A Compiler Framework for Tiling Imperfectly-Nested Loops.- Models.- Parallel Programming with Interacting Processes.- Application of the Polytope Model to Functional Programs.- Multilingua...

Automatic Parallelizing Compiler for Distributed Memory Parallel Computers: New Algorithms to Improve the Performance of the Inspector/Executor

1995

The SPMD (Single-Program Multiple-Data Stream) model has been widely adopted as the base of parallelizing compilers and parallel programming languages for scientific programs [1]. This model will work well not only for shared memory machines but also for distributed memory multicomputers, provided that; ■ data are allocated appropriately by the programmer and/or the compiler itself, ■ the compiler distributes parallel computations to processors so that interprocessor communication costs are minimized, and ■ codes for communication are inserted, only when necessary, at the point adequate for minimizing communication latency.

Languages and Compilers for Parallel Computing

Lecture Notes in Computer Science, 2000

The topics covered include languages and language extensions for parallel computing - a status report on CONSUL, a future-based parallel language for a general-purpose high-parallel computer; COOL, blackboard programming in shared Prolog, refined C, the XYZ ...

SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers

ACM Sigplan …, 1994

Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in the context of a complete compiler, but developing such an infrastructure requires a huge investment in time and resources. We have spent a number of years building the SUIF compiler into a powerful, flexible system, and we would now like to share the results of our efforts.

Automatic parallelization in the paralax compiler

2011

The efficient development of multi-threaded software has, for many years, been an unsolved problem in computer science. Finding a solution to this problem has become urgent with the advent of multi-core processors. Furthermore, the problem has become more complicated because multi-cores are everywhere (desktop, laptop, embedded system). As such, they execute generic programs which exhibit very different characteristics than the scientific applications that have been the focus of parallel computing in the past.

Compiling for massively parallel architectures: a perspective

Microprocessing and Microprogramming, 1995

The problem of automatically generating programs for massively parallel computers is a very complicated one, mainly because there are many architectures, each of them seeming to pose its own particular compilation problem. The purpose of this paper is to propose a framework in which to discuss the compilation process, and to show that the features which a ect it are few and generate a small number of combinations. The paper is oriented toward ne-grained parallelization of static control programs, with emphasis on data ow analysis, scheduling and placement. When going from there to more general programs and to coarser parallelism, one encounters new problems, some of which are discussed in the conclusion.

Functional Languages: Compiler Technology and Parallelism (Dagstuhl Seminar 9213)

1992

The seminar emphasized four issues: o sta.tic program analysis o extensions for progra.mmer control of pa.ra.llelism o functional+logic languages and constraints o implementa.tion of functiona.l la.ngua.ges There were two formal discussion sessions, which a.ddressed the problems of input/output in functional la.nguages, and the utility of static program analysis. It is gratifying that the first Dagstuhl seminar in this area. (Functional Languages: Optimization for Parallelism) had stimulated many developments which were reported a.t this one. A particular feature of this seminar was the la.rge number of prototypes which were demonstrated a.nd which vividly illustrated the issues raised in discussions and presentations. Static Program Analysis Static program analysis has been thoroughly investigated for optimising sequential implementations, but parallel ones offer new problems. Discovering properties of synchronisation, for example, requires richer domains than those used in the sequential setting, lea.ding to a combinatorial explosion in cost. Current sequential analyses operate at or beyond the limits of today s algorithm technology. The most expensive aspect is computing xpoints, which requires a convergence test am] therefore a decision procedure for equality of abstract values. New work reported here helps reduce the need for convergence tests. and, using the algebraic properties of the operators in the resulting straightline code, partial evaluation ca.n be used to generate very efficient parallel code with a high degree of processor utilization. This is exempli ed by the well-known problem of parallel evaluation of expressions (and, more generally, straight-line code) over a semi-ring. In this context, partial evaluation is evaluation over the induced polynomial semi-ring that can be executed in parallel by tree contraction.

Intermediate Code Generation for Portable Scalable, Compilers. Architecture Independent Data Parallelism: The Preliminaries

1993

This paper introduces the goals of the Portable, Scalable, Architecture Independent (PSI) Compiler Project for Data Parallel Languages at the University of Missouri-Rolla. A goal of this project is to produce a subcompiler for data parallel scientific programming languages such as HPF(High Performance Fortran) where the input grammar is translated to a three-address code intermediate language. Ultimately we plan to integrate our work into automated synthesis systems for scientific programming because we feel that it should not be necessary to learn complicated programming techniques to use multiprocessor computers or networks of computers effectively. This paper shows how to compile a data parallel language to an arbitrary multiprocessor topology or network of CPUs given the number of processors, length of vector registers, and total number of components in an array assuming a message passing, distributed memory paradigm of send and receive. We emphasize that this paradigm is not on...

Parallel architecture and compilation techniques

ACM SIGARCH Computer Architecture News

In this issue, we present a selection of papers from several workshops held in September 2001 in Barcelona, Spain. The workshops were hosted within the PACT (Parallel Architecture and Compilation Techniques) Conference [1], [2]. The advances in technology arc improving the processing power and the computing speed of systems. As addressed by keynote speakers, the time has never been so propitious to explore the potentials of compilers on the architecture and vice versa, due to the strong demand for advances in the interaction of these two areas. The increasing interest is also shown by the record number of attendees this year. This is also due to the , high-quality workshops focused on hot topics in Compiler and Computer Architecture research areas. This year 2001, five different workshops covered hot research themes: the Compilers and Operating Systems for Low Power (COLP) workshop, the European Workshop on OpenMP (EWOMP), the MEmory DEcoupling Architecture workshop (MEDEA), the Ubiquitous Computing and Communication (UCC) workshop, and the Workshop on Binary Translation (WBT). For copyright reasons, we cannot include

PACWON: A parallelizing compiler for workstations on a network

The current in¯ux of networked workstations has prompted people to use this platform as a multiprocessing environment. In addition, tools like the Parallel Virtual Machine (PVM) has fuelled the growth even further. In this work we present the design and some possible future strategies for automatically parallelizing sequential programs using a compilation tool called PACWON for a network of workstations (NOW). The sequential programs are written using a subset of C ± without pointers and structures. The target language is C embedded with PVM library calls. The automatically parallelized programs are run on a NOW environment.

Development of large scale high performance applications with a parallelizing compiler

A bstract: -High level environment such as High Performance Fortran (HPF) supporting the development of parallel applications and porting of legacy codes to parallel architectures have not yet gained a broad acceptance and diffusion. Common objections claim difficulty of performance tuning, limitation of its application to regular, data parallel computations, and lack of robustness of parallelizing HPF compilers in handling large sized codes.

Advanced Compilers, Architectures and Parallel Systems

1994

Abstract Multithreaded node architectures have been proposed for future multiprocessor systems. However, some open issues remain: can e cient multithreading support be provided in a multiprocessor machine such that it is capable of tolerating the synchronization and communication latencies, without intruding on the performance of sequentially-executed code?

Enabling Primitives for Compiling Parallel Languages

Languages, Compilers and Run-Time Systems for Scalable Computers, 1996

This paper presents three novel language implementation primitives-lazy threads, stacklets, and synchronizers-and shows how they combine to provide a parallel call at nearly the efficiency of a sequential call. The central idea is to transform parallel calls into parallel-ready sequential calls. Excess parallelism degrades into sequential calls with the attendant efficient stack management and direct transfer of control and data, unless a call truly needs to execute in parallel, in which case it gets its own thread of control. We show how these techniques can be applied to distribute work efficiently on multiprocessors.

Parallelising large irregular programs: an experience with Naira

Information Sciences, 2002

Naira is a compiler for Haskell, written in Glasgow parallel Haskell. It exhibits modest, but irregular, parallelism that is determined by properties of the program being compiled, e.g. the complexity of the types and of the pattern matching. We report four experiments into Naira's parallel behaviour using a set of realistic inputs: namely the 18 Haskell modules of Naira itself. The issues investigated are: · Does increasing input size improve sequential eciency and speedup? · To what extent do high communications latencies reduce average parallelism and speedup? · Does migrating running threads between processors improve average parallelism and speedup at all latencies? Ó