Hubert Ritzdorf - Academia.edu (original) (raw)

Papers by Hubert Ritzdorf

Research paper thumbnail of Adaptive parallel multigrid for Euler and incompressible Navier-Stokes equations

OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), Dec 31, 1996

Research paper thumbnail of Benchmark and Large Scale Examples

Several test calculations, benchmarks and large scale applications have been carried out to demon... more Several test calculations, benchmarks and large scale applications have been carried out to demonstrate the success of the parallelization approach chosen in the POPINDA project and to investigate the potential of the parallelization of a real application code. In this section we summarize these results. First we consider some relatively simple test cases discuss and the influence of the communication system on the observed speed-ups. Moreover, we compare the relative performance of various achitectures for two particular test problems. In the second part of this section, we consider various really large scale examples with up to more than 6 million grid points which can be solved within 1 to 3 hours on suitable parallel systems.

Research paper thumbnail of Parallelization and Benchmarking

Notes on numerical fluid mechanics, 1999

In this section we discuss the approach of using unified block structures as the basis for parall... more In this section we discuss the approach of using unified block structures as the basis for parallelization. For this purpose, we will resume certain aspects already introduced briefly in Section 1.1 and discuss them in more detail. In particular, we will have to regard the considerations which are important from the point of view of the users and the developers of the application codes. Correspondingly, we start the discussion with the requirements for the parallelization of large CFD codes, give a survey on parallelization strategies and describe the parallelization approach used in POPINDA. This approach is essentially based on grid partitioning utilizing the concept of block-structured grids and on message passing to exchange information between adjacent blocks. Finally, we make some remarks on the standardization of the production codes.

Research paper thumbnail of A Robust Parallel Solver for 3D Fluid Flow Problems Using a High-Level Communications Library

Research paper thumbnail of Experiences with a parallel multiblock multigrid solution technique for the Euler equations

Notes on Numerical Fluid Mechanics (NNFM), 1995

The parallel solution of 2D steady compressible Euler equations with a multigrid method is invest... more The parallel solution of 2D steady compressible Euler equations with a multigrid method is investigated. The parallelization technique used is the grid partitioning strategy. The influence of splitting into many blocks on multigrid convergence rates is reduced with an extra interior boundary relaxation and an extra update of the overlap region. The finite volume discretization of the equations is based on the Godunov upwind approach, with Osher’s flux difference splitting for the convective terms. Second order accuracy is obtained with defect correction. Solution times of the multigrid algorithms are presented for several parallel MIMD computers.

Research paper thumbnail of Adaptive Local Refinements

Notes on Numerical Fluid Mechanics (NNFM), 1999

The applications under consideration in this book are extremely challenging with respect to compu... more The applications under consideration in this book are extremely challenging with respect to computing time and memory. The parallelization of the flow solvers and the use of multigrid solvers have enabled the computation of viscous (steady-state) flows around full aircraft within a few hours. For future even more complex applications, however, the computing times have to be reduced further. A very promising approach is to use adaptive grids. In this section we describe the idea of this approach and show that it can be combined with multigrid in a natural and straightforward way. The problem of defining appropriate refinement criteria and the problems of the parallelization of block-structured multigrid are discussed briefly.

Research paper thumbnail of T h e G M D Communications Library for Grid--Oriented Problems

Research paper thumbnail of The PRISM Coupling and I/O System: OASIS4

The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM p... more The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM project, is to provide a portable, efficient and easy-to-use open source software package, which includes a concise appli- cation programmer interface (API) to manage the coupling of arbitrary climate component models as well as the I/O of each individual component. In this arti- cle we will focus on the way the PRISM coupler drives the whole coupled model, ensuring the synchronization of the different component models, the exchange of the coupling fields directly between the components or via additional transforma- tion processes, and I/O actions from/to files.

Research paper thumbnail of Flux Difference Splitting for Three‐Dimensional Steady Incompressible Navier–Stokes Equations in Curvilinear Co‐Ordinates

International Journal for Numerical Methods in Fluids, 1996

A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a fl... more A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a flux-difference-splitting formulation is presented. The discretization employs primitive variables of Cartesian velocity components and pressure. The splitting used here is a polynomial splitting introduced by Dick and Linden of Roe type. Second-order accuracy is obtained with the defect correction approach in which the state vector is inter-polated

Research paper thumbnail of The Compilers and MPI Library for SX-9

The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler;... more The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler; both of which feature excellent optimization, vectorization and parallelization functions. HPF/SX V2, (a compiler for HPF (High Performance Fortran), which is the de facto standard language for distributed parallel processing), and MPI/SX and MPI2/SX, (fully compliant with the distributed parallel processing interfaces MPI-1.3 and MPI-2.1 specifications) are also provided. This paper is intended to introduce the functions and features of the speed-up technology adopted in these programming interfaces for the SX-9.

Research paper thumbnail of Investigating High Performance RMA Interfaces

The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also know... more The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also known as remote memory access (RMA). It was designed with the goal that it should permit efficient implementations on multiple platforms and networking technologies, and also in heterogeneous environments and non-cache-coherent systems. Nonetheless, even 12 years after its existence, the MPI-2 RMA interface remains scarcely used for a number of reasons. This paper discusses the limitations of the MPI-2 RMA specification, outlines the goals and requirements for a new RMA API that would better meet the needs of both users and implementers, and presents a strawman proposal for such an API. We also study the tradeoffs facing the design of this new API and discuss how it may be implemented efficiently on both cache-coherent and non-cache-coherent systems.

Research paper thumbnail of Improving generic non-contiguous file access for MPI-IO

Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file ... more Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file access in MPI-IO. The improvement consists in the replacement of the conventional data management algorithms based on a representation of the non-contiguous fileview as a list of 〈offset, length 〉 tuples. The improvement is termed listless i/o as it instead makes use of space- and time-efficient datatype handling functionality that is completely free of lists for processing non-contiguous data in the file or in memory. Listless i/o has been implemented for both independent and collective file accesses and improves access performance by increasing the data throughput between user buffers and file buffers. Additionally, it reduces the memory footprint of the process performing non-contiguous I/O. In this paper we give results for a synthetic benchmark on a PC cluster using different file systems. We demonstrate improvements in I/O bandwidth that exceed a factor of 10. 1

Research paper thumbnail of The Implementation of MPI-2 One-Sided Communication for the NEC SX-5

We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote M... more We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote Memory Access) for the NEC SX-5 vector supercomputer. MPI/SX is a non-threaded implementation of the full MPI-2 standard. Essential features of the implementation are presented, including the synchronization mechanisms, the handling of communication windows in global shared and in process local memory, as well as the handling of MPI derived datatypes. In comparative benchmarks the data transfer operations for one-sided communication and point-to-point message passing show very similar performance, both when data reside in global shared and when in process local memory. Derived datatypes, which are of particular importance for applications using one-sided communications, impose only a modest overhead and can be used without any significant loss of performance. Thus, the MPI/SX programmer can freely choose either the message passing or the one-sided communication model, whichever i...

Research paper thumbnail of Adaptive Multigrid on Distributed Memory Computers 1

A general software package has been developed for solving systems of partial diierential equation... more A general software package has been developed for solving systems of partial diierential equations with adaptive multigrid methods (MLAT) on distributed memory computers. The package supports the dynamic mapping of reene-ment levels. The general strategy is described and results are reported on compute-intensive problems as well as on some simple problems representing worst-case situations from a parallel eeciency point of view. Inherent limitations of the parallel eeciency will be discussed.

Research paper thumbnail of Efficient message Passing interface implementations for NEC parallel computers : Toward reality in scientific simulations: NEC's 21st Century Odyssey

Nec Research Development, 1998

Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface ... more Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface for message passing in parallel applications. At the C&C Research Laboratory at Sankt Augustin, Germany, efficient MPI implementations are under development, including the product version for the SX-4 vector supercomputer. In the beginning, highest priority was given to the optimization of latency and throughput for point-to-point communication, followed by the development of shared-memory based collective operations. Now, with customers asking for heterogeneous couplings between different parallel systems, the provision of MPI implementations for such configurations has become a requirement. At the same time, work is starting on the extended MIPI-2 standard, the definition of which was published in 1997. This article presents the current status of MPI library software at NEC and gives an outlook on future activities.

Research paper thumbnail of MPI/SX for Multi-Node SX-5

Research paper thumbnail of The MPI/SX implementation of MPI for NEC's SX-6 and other NEC platforms

Nec Research Development, 2003

Résumé/Abstract MPI is the standard communication interface for programming parallel applications... more Résumé/Abstract MPI is the standard communication interface for programming parallel applications in the message passing paradigm. MPI/SX is a dedicated, efficient and highly optimized implementation of the full MPI-2 standard for the NEC SX-series of parallel ...

Research paper thumbnail of Block-Structured Multigrid for the Navier-Stokes Equations: Experiences and Scalability Questions

Research paper thumbnail of Communication for the NEC SX-5∗

All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

Research paper thumbnail of Portable parallelization of the Navier-Stokes code NSFLEX

Parallel Computational Fluid Dynamics 1995, 1996

Research paper thumbnail of Adaptive parallel multigrid for Euler and incompressible Navier-Stokes equations

OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), Dec 31, 1996

Research paper thumbnail of Benchmark and Large Scale Examples

Several test calculations, benchmarks and large scale applications have been carried out to demon... more Several test calculations, benchmarks and large scale applications have been carried out to demonstrate the success of the parallelization approach chosen in the POPINDA project and to investigate the potential of the parallelization of a real application code. In this section we summarize these results. First we consider some relatively simple test cases discuss and the influence of the communication system on the observed speed-ups. Moreover, we compare the relative performance of various achitectures for two particular test problems. In the second part of this section, we consider various really large scale examples with up to more than 6 million grid points which can be solved within 1 to 3 hours on suitable parallel systems.

Research paper thumbnail of Parallelization and Benchmarking

Notes on numerical fluid mechanics, 1999

In this section we discuss the approach of using unified block structures as the basis for parall... more In this section we discuss the approach of using unified block structures as the basis for parallelization. For this purpose, we will resume certain aspects already introduced briefly in Section 1.1 and discuss them in more detail. In particular, we will have to regard the considerations which are important from the point of view of the users and the developers of the application codes. Correspondingly, we start the discussion with the requirements for the parallelization of large CFD codes, give a survey on parallelization strategies and describe the parallelization approach used in POPINDA. This approach is essentially based on grid partitioning utilizing the concept of block-structured grids and on message passing to exchange information between adjacent blocks. Finally, we make some remarks on the standardization of the production codes.

Research paper thumbnail of A Robust Parallel Solver for 3D Fluid Flow Problems Using a High-Level Communications Library

Research paper thumbnail of Experiences with a parallel multiblock multigrid solution technique for the Euler equations

Notes on Numerical Fluid Mechanics (NNFM), 1995

The parallel solution of 2D steady compressible Euler equations with a multigrid method is invest... more The parallel solution of 2D steady compressible Euler equations with a multigrid method is investigated. The parallelization technique used is the grid partitioning strategy. The influence of splitting into many blocks on multigrid convergence rates is reduced with an extra interior boundary relaxation and an extra update of the overlap region. The finite volume discretization of the equations is based on the Godunov upwind approach, with Osher’s flux difference splitting for the convective terms. Second order accuracy is obtained with defect correction. Solution times of the multigrid algorithms are presented for several parallel MIMD computers.

Research paper thumbnail of Adaptive Local Refinements

Notes on Numerical Fluid Mechanics (NNFM), 1999

The applications under consideration in this book are extremely challenging with respect to compu... more The applications under consideration in this book are extremely challenging with respect to computing time and memory. The parallelization of the flow solvers and the use of multigrid solvers have enabled the computation of viscous (steady-state) flows around full aircraft within a few hours. For future even more complex applications, however, the computing times have to be reduced further. A very promising approach is to use adaptive grids. In this section we describe the idea of this approach and show that it can be combined with multigrid in a natural and straightforward way. The problem of defining appropriate refinement criteria and the problems of the parallelization of block-structured multigrid are discussed briefly.

Research paper thumbnail of T h e G M D Communications Library for Grid--Oriented Problems

Research paper thumbnail of The PRISM Coupling and I/O System: OASIS4

The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM p... more The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM project, is to provide a portable, efficient and easy-to-use open source software package, which includes a concise appli- cation programmer interface (API) to manage the coupling of arbitrary climate component models as well as the I/O of each individual component. In this arti- cle we will focus on the way the PRISM coupler drives the whole coupled model, ensuring the synchronization of the different component models, the exchange of the coupling fields directly between the components or via additional transforma- tion processes, and I/O actions from/to files.

Research paper thumbnail of Flux Difference Splitting for Three‐Dimensional Steady Incompressible Navier–Stokes Equations in Curvilinear Co‐Ordinates

International Journal for Numerical Methods in Fluids, 1996

A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a fl... more A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a flux-difference-splitting formulation is presented. The discretization employs primitive variables of Cartesian velocity components and pressure. The splitting used here is a polynomial splitting introduced by Dick and Linden of Roe type. Second-order accuracy is obtained with the defect correction approach in which the state vector is inter-polated

Research paper thumbnail of The Compilers and MPI Library for SX-9

The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler;... more The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler; both of which feature excellent optimization, vectorization and parallelization functions. HPF/SX V2, (a compiler for HPF (High Performance Fortran), which is the de facto standard language for distributed parallel processing), and MPI/SX and MPI2/SX, (fully compliant with the distributed parallel processing interfaces MPI-1.3 and MPI-2.1 specifications) are also provided. This paper is intended to introduce the functions and features of the speed-up technology adopted in these programming interfaces for the SX-9.

Research paper thumbnail of Investigating High Performance RMA Interfaces

The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also know... more The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also known as remote memory access (RMA). It was designed with the goal that it should permit efficient implementations on multiple platforms and networking technologies, and also in heterogeneous environments and non-cache-coherent systems. Nonetheless, even 12 years after its existence, the MPI-2 RMA interface remains scarcely used for a number of reasons. This paper discusses the limitations of the MPI-2 RMA specification, outlines the goals and requirements for a new RMA API that would better meet the needs of both users and implementers, and presents a strawman proposal for such an API. We also study the tradeoffs facing the design of this new API and discuss how it may be implemented efficiently on both cache-coherent and non-cache-coherent systems.

Research paper thumbnail of Improving generic non-contiguous file access for MPI-IO

Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file ... more Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file access in MPI-IO. The improvement consists in the replacement of the conventional data management algorithms based on a representation of the non-contiguous fileview as a list of 〈offset, length 〉 tuples. The improvement is termed listless i/o as it instead makes use of space- and time-efficient datatype handling functionality that is completely free of lists for processing non-contiguous data in the file or in memory. Listless i/o has been implemented for both independent and collective file accesses and improves access performance by increasing the data throughput between user buffers and file buffers. Additionally, it reduces the memory footprint of the process performing non-contiguous I/O. In this paper we give results for a synthetic benchmark on a PC cluster using different file systems. We demonstrate improvements in I/O bandwidth that exceed a factor of 10. 1

Research paper thumbnail of The Implementation of MPI-2 One-Sided Communication for the NEC SX-5

We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote M... more We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote Memory Access) for the NEC SX-5 vector supercomputer. MPI/SX is a non-threaded implementation of the full MPI-2 standard. Essential features of the implementation are presented, including the synchronization mechanisms, the handling of communication windows in global shared and in process local memory, as well as the handling of MPI derived datatypes. In comparative benchmarks the data transfer operations for one-sided communication and point-to-point message passing show very similar performance, both when data reside in global shared and when in process local memory. Derived datatypes, which are of particular importance for applications using one-sided communications, impose only a modest overhead and can be used without any significant loss of performance. Thus, the MPI/SX programmer can freely choose either the message passing or the one-sided communication model, whichever i...

Research paper thumbnail of Adaptive Multigrid on Distributed Memory Computers 1

A general software package has been developed for solving systems of partial diierential equation... more A general software package has been developed for solving systems of partial diierential equations with adaptive multigrid methods (MLAT) on distributed memory computers. The package supports the dynamic mapping of reene-ment levels. The general strategy is described and results are reported on compute-intensive problems as well as on some simple problems representing worst-case situations from a parallel eeciency point of view. Inherent limitations of the parallel eeciency will be discussed.

Research paper thumbnail of Efficient message Passing interface implementations for NEC parallel computers : Toward reality in scientific simulations: NEC's 21st Century Odyssey

Nec Research Development, 1998

Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface ... more Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface for message passing in parallel applications. At the C&C Research Laboratory at Sankt Augustin, Germany, efficient MPI implementations are under development, including the product version for the SX-4 vector supercomputer. In the beginning, highest priority was given to the optimization of latency and throughput for point-to-point communication, followed by the development of shared-memory based collective operations. Now, with customers asking for heterogeneous couplings between different parallel systems, the provision of MPI implementations for such configurations has become a requirement. At the same time, work is starting on the extended MIPI-2 standard, the definition of which was published in 1997. This article presents the current status of MPI library software at NEC and gives an outlook on future activities.

Research paper thumbnail of MPI/SX for Multi-Node SX-5

Research paper thumbnail of The MPI/SX implementation of MPI for NEC's SX-6 and other NEC platforms

Nec Research Development, 2003

Résumé/Abstract MPI is the standard communication interface for programming parallel applications... more Résumé/Abstract MPI is the standard communication interface for programming parallel applications in the message passing paradigm. MPI/SX is a dedicated, efficient and highly optimized implementation of the full MPI-2 standard for the NEC SX-series of parallel ...

Research paper thumbnail of Block-Structured Multigrid for the Navier-Stokes Equations: Experiences and Scalability Questions

Research paper thumbnail of Communication for the NEC SX-5∗

All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

Research paper thumbnail of Portable parallelization of the Navier-Stokes code NSFLEX

Parallel Computational Fluid Dynamics 1995, 1996