Matt Rosing - Academia.edu (original) (raw)

Papers by Matt Rosing

Research paper thumbnail of User Defined Compiler Support for Constructing Distributed Arrays

Languages, Compilers and Run-Time Systems for Scalable Computers, 1996

This paper describes a preprocessor developed for supporting distributed arrays on parallel machi... more This paper describes a preprocessor developed for supporting distributed arrays on parallel machines. The goal is to support Fortran90 like array operations on arrays that have been distributed using general, user defined mappings. To support both general mapping functions and efficient implementation of array operations, the user programs applications at two distinct levels. There is both a high level view where the programmer does most of the programming, and a low level where the user defines how the high level code is to be implemented on a specific target machine. The key to generating efficient runtime code while keeping flexible support for different types of data distributions is to incorporate the low level code into the high level code at compile time. This paper describes the operations performed by the preprocessor and how the user defines mapping functions.

Research paper thumbnail of The DINO User's Manual

Public reporting burden for the collection of information is estimated to average 1 hour per resp... more Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

Research paper thumbnail of Flexible Language Constructs for Large Parallel Programs

Scientific Programming, 1994

The goal of the research described in this article is to develop flexible language constructs for... more The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that differen...

Research paper thumbnail of Low Latency Messages on Distributed Memory Multiprocessors

Scientific Programming, 1995

This article describes many of the issues in developing an efficient interface for communication ... more This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involved in supporting low latency messages on distributed memory machines.

Research paper thumbnail of A programmable preprocessor for parallelizing Fortran-90

Proceedings of the 1999 ACM/IEEE conference on Supercomputing

A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has be... more A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department of Energy. The tool provides the basic functionality of a traditional preprocessor where directives are embedded in a serial Fortran program and interpreted by the preprocessor to produce parallel Fortran code with MPI calls. The unique aspect of this work is that the user can make additions to, or modify, these directives. The directives reside in a preprocessor library and changes to this library can range from small changes to customize an existing library, to larger changes for porting a library, to completely replacing the library. The preprocessor is programmed with a library of directives written in a C-like language, called DL, that has added support for manipulating Fortran code fragments. The primary benefits to the user are twofold: It is fairly easy for any user to generate efficient, parallel code from Fortran-90 with embedded directives, and the long term viability of the user's software is guaranteed. This is because the source code will always run on a serial machine (the directives are transparent to standard Fortran compilers), and the preprocessor library can be modified to work with different hardware and software environments. A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures. Performance of these programs is similar to programs explicitly written for a parallel machine. Binaries of the preprocessor core, as well as the preprocessor library source code used in our groundwater modeling codes are currently available.

Research paper thumbnail of User Defined Compiler Support for Constructing Distributed Arrays

Languages, Compilers and Run-Time Systems for Scalable Computers, 1996

This paper describes a preprocessor developed for supporting distributed arrays on parallel machi... more This paper describes a preprocessor developed for supporting distributed arrays on parallel machines. The goal is to support Fortran90 like array operations on arrays that have been distributed using general, user defined mappings. To support both general mapping functions and efficient implementation of array operations, the user programs applications at two distinct levels. There is both a high level view where the programmer does most of the programming, and a low level where the user defines how the high level code is to be implemented on a specific target machine. The key to generating efficient runtime code while keeping flexible support for different types of data distributions is to incorporate the low level code into the high level code at compile time. This paper describes the operations performed by the preprocessor and how the user defines mapping functions.

Research paper thumbnail of The DINO User's Manual

Public reporting burden for the collection of information is estimated to average 1 hour per resp... more Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

Research paper thumbnail of Flexible Language Constructs for Large Parallel Programs

Scientific Programming, 1994

The goal of the research described in this article is to develop flexible language constructs for... more The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that differen...

Research paper thumbnail of Low Latency Messages on Distributed Memory Multiprocessors

Scientific Programming, 1995

This article describes many of the issues in developing an efficient interface for communication ... more This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involved in supporting low latency messages on distributed memory machines.

Research paper thumbnail of A programmable preprocessor for parallelizing Fortran-90

Proceedings of the 1999 ACM/IEEE conference on Supercomputing

A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has be... more A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department of Energy. The tool provides the basic functionality of a traditional preprocessor where directives are embedded in a serial Fortran program and interpreted by the preprocessor to produce parallel Fortran code with MPI calls. The unique aspect of this work is that the user can make additions to, or modify, these directives. The directives reside in a preprocessor library and changes to this library can range from small changes to customize an existing library, to larger changes for porting a library, to completely replacing the library. The preprocessor is programmed with a library of directives written in a C-like language, called DL, that has added support for manipulating Fortran code fragments. The primary benefits to the user are twofold: It is fairly easy for any user to generate efficient, parallel code from Fortran-90 with embedded directives, and the long term viability of the user's software is guaranteed. This is because the source code will always run on a serial machine (the directives are transparent to standard Fortran compilers), and the preprocessor library can be modified to work with different hardware and software environments. A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures. Performance of these programs is similar to programs explicitly written for a parallel machine. Binaries of the preprocessor core, as well as the preprocessor library source code used in our groundwater modeling codes are currently available.