Hubert Ritzdorf - Academia.edu (original) (raw)
Papers by Hubert Ritzdorf
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), Dec 31, 1996
Several test calculations, benchmarks and large scale applications have been carried out to demon... more Several test calculations, benchmarks and large scale applications have been carried out to demonstrate the success of the parallelization approach chosen in the POPINDA project and to investigate the potential of the parallelization of a real application code. In this section we summarize these results. First we consider some relatively simple test cases discuss and the influence of the communication system on the observed speed-ups. Moreover, we compare the relative performance of various achitectures for two particular test problems. In the second part of this section, we consider various really large scale examples with up to more than 6 million grid points which can be solved within 1 to 3 hours on suitable parallel systems.
Notes on numerical fluid mechanics, 1999
In this section we discuss the approach of using unified block structures as the basis for parall... more In this section we discuss the approach of using unified block structures as the basis for parallelization. For this purpose, we will resume certain aspects already introduced briefly in Section 1.1 and discuss them in more detail. In particular, we will have to regard the considerations which are important from the point of view of the users and the developers of the application codes. Correspondingly, we start the discussion with the requirements for the parallelization of large CFD codes, give a survey on parallelization strategies and describe the parallelization approach used in POPINDA. This approach is essentially based on grid partitioning utilizing the concept of block-structured grids and on message passing to exchange information between adjacent blocks. Finally, we make some remarks on the standardization of the production codes.
Notes on Numerical Fluid Mechanics (NNFM), 1995
The parallel solution of 2D steady compressible Euler equations with a multigrid method is invest... more The parallel solution of 2D steady compressible Euler equations with a multigrid method is investigated. The parallelization technique used is the grid partitioning strategy. The influence of splitting into many blocks on multigrid convergence rates is reduced with an extra interior boundary relaxation and an extra update of the overlap region. The finite volume discretization of the equations is based on the Godunov upwind approach, with Osher’s flux difference splitting for the convective terms. Second order accuracy is obtained with defect correction. Solution times of the multigrid algorithms are presented for several parallel MIMD computers.
Notes on Numerical Fluid Mechanics (NNFM), 1999
The applications under consideration in this book are extremely challenging with respect to compu... more The applications under consideration in this book are extremely challenging with respect to computing time and memory. The parallelization of the flow solvers and the use of multigrid solvers have enabled the computation of viscous (steady-state) flows around full aircraft within a few hours. For future even more complex applications, however, the computing times have to be reduced further. A very promising approach is to use adaptive grids. In this section we describe the idea of this approach and show that it can be combined with multigrid in a natural and straightforward way. The problem of defining appropriate refinement criteria and the problems of the parallelization of block-structured multigrid are discussed briefly.
The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM p... more The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM project, is to provide a portable, efficient and easy-to-use open source software package, which includes a concise appli- cation programmer interface (API) to manage the coupling of arbitrary climate component models as well as the I/O of each individual component. In this arti- cle we will focus on the way the PRISM coupler drives the whole coupled model, ensuring the synchronization of the different component models, the exchange of the coupling fields directly between the components or via additional transforma- tion processes, and I/O actions from/to files.
International Journal for Numerical Methods in Fluids, 1996
A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a fl... more A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a flux-difference-splitting formulation is presented. The discretization employs primitive variables of Cartesian velocity components and pressure. The splitting used here is a polynomial splitting introduced by Dick and Linden of Roe type. Second-order accuracy is obtained with the defect correction approach in which the state vector is inter-polated
The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler;... more The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler; both of which feature excellent optimization, vectorization and parallelization functions. HPF/SX V2, (a compiler for HPF (High Performance Fortran), which is the de facto standard language for distributed parallel processing), and MPI/SX and MPI2/SX, (fully compliant with the distributed parallel processing interfaces MPI-1.3 and MPI-2.1 specifications) are also provided. This paper is intended to introduce the functions and features of the speed-up technology adopted in these programming interfaces for the SX-9.
The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also know... more The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also known as remote memory access (RMA). It was designed with the goal that it should permit efficient implementations on multiple platforms and networking technologies, and also in heterogeneous environments and non-cache-coherent systems. Nonetheless, even 12 years after its existence, the MPI-2 RMA interface remains scarcely used for a number of reasons. This paper discusses the limitations of the MPI-2 RMA specification, outlines the goals and requirements for a new RMA API that would better meet the needs of both users and implementers, and presents a strawman proposal for such an API. We also study the tradeoffs facing the design of this new API and discuss how it may be implemented efficiently on both cache-coherent and non-cache-coherent systems.
Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file ... more Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file access in MPI-IO. The improvement consists in the replacement of the conventional data management algorithms based on a representation of the non-contiguous fileview as a list of 〈offset, length 〉 tuples. The improvement is termed listless i/o as it instead makes use of space- and time-efficient datatype handling functionality that is completely free of lists for processing non-contiguous data in the file or in memory. Listless i/o has been implemented for both independent and collective file accesses and improves access performance by increasing the data throughput between user buffers and file buffers. Additionally, it reduces the memory footprint of the process performing non-contiguous I/O. In this paper we give results for a synthetic benchmark on a PC cluster using different file systems. We demonstrate improvements in I/O bandwidth that exceed a factor of 10. 1
We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote M... more We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote Memory Access) for the NEC SX-5 vector supercomputer. MPI/SX is a non-threaded implementation of the full MPI-2 standard. Essential features of the implementation are presented, including the synchronization mechanisms, the handling of communication windows in global shared and in process local memory, as well as the handling of MPI derived datatypes. In comparative benchmarks the data transfer operations for one-sided communication and point-to-point message passing show very similar performance, both when data reside in global shared and when in process local memory. Derived datatypes, which are of particular importance for applications using one-sided communications, impose only a modest overhead and can be used without any significant loss of performance. Thus, the MPI/SX programmer can freely choose either the message passing or the one-sided communication model, whichever i...
A general software package has been developed for solving systems of partial diierential equation... more A general software package has been developed for solving systems of partial diierential equations with adaptive multigrid methods (MLAT) on distributed memory computers. The package supports the dynamic mapping of reene-ment levels. The general strategy is described and results are reported on compute-intensive problems as well as on some simple problems representing worst-case situations from a parallel eeciency point of view. Inherent limitations of the parallel eeciency will be discussed.
Nec Research Development, 1998
Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface ... more Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface for message passing in parallel applications. At the C&C Research Laboratory at Sankt Augustin, Germany, efficient MPI implementations are under development, including the product version for the SX-4 vector supercomputer. In the beginning, highest priority was given to the optimization of latency and throughput for point-to-point communication, followed by the development of shared-memory based collective operations. Now, with customers asking for heterogeneous couplings between different parallel systems, the provision of MPI implementations for such configurations has become a requirement. At the same time, work is starting on the extended MIPI-2 standard, the definition of which was published in 1997. This article presents the current status of MPI library software at NEC and gives an outlook on future activities.
Nec Research Development, 2003
Résumé/Abstract MPI is the standard communication interface for programming parallel applications... more Résumé/Abstract MPI is the standard communication interface for programming parallel applications in the message passing paradigm. MPI/SX is a dedicated, efficient and highly optimized implementation of the full MPI-2 standard for the NEC SX-series of parallel ...
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Parallel Computational Fluid Dynamics 1995, 1996
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), Dec 31, 1996
Several test calculations, benchmarks and large scale applications have been carried out to demon... more Several test calculations, benchmarks and large scale applications have been carried out to demonstrate the success of the parallelization approach chosen in the POPINDA project and to investigate the potential of the parallelization of a real application code. In this section we summarize these results. First we consider some relatively simple test cases discuss and the influence of the communication system on the observed speed-ups. Moreover, we compare the relative performance of various achitectures for two particular test problems. In the second part of this section, we consider various really large scale examples with up to more than 6 million grid points which can be solved within 1 to 3 hours on suitable parallel systems.
Notes on numerical fluid mechanics, 1999
In this section we discuss the approach of using unified block structures as the basis for parall... more In this section we discuss the approach of using unified block structures as the basis for parallelization. For this purpose, we will resume certain aspects already introduced briefly in Section 1.1 and discuss them in more detail. In particular, we will have to regard the considerations which are important from the point of view of the users and the developers of the application codes. Correspondingly, we start the discussion with the requirements for the parallelization of large CFD codes, give a survey on parallelization strategies and describe the parallelization approach used in POPINDA. This approach is essentially based on grid partitioning utilizing the concept of block-structured grids and on message passing to exchange information between adjacent blocks. Finally, we make some remarks on the standardization of the production codes.
Notes on Numerical Fluid Mechanics (NNFM), 1995
The parallel solution of 2D steady compressible Euler equations with a multigrid method is invest... more The parallel solution of 2D steady compressible Euler equations with a multigrid method is investigated. The parallelization technique used is the grid partitioning strategy. The influence of splitting into many blocks on multigrid convergence rates is reduced with an extra interior boundary relaxation and an extra update of the overlap region. The finite volume discretization of the equations is based on the Godunov upwind approach, with Osher’s flux difference splitting for the convective terms. Second order accuracy is obtained with defect correction. Solution times of the multigrid algorithms are presented for several parallel MIMD computers.
Notes on Numerical Fluid Mechanics (NNFM), 1999
The applications under consideration in this book are extremely challenging with respect to compu... more The applications under consideration in this book are extremely challenging with respect to computing time and memory. The parallelization of the flow solvers and the use of multigrid solvers have enabled the computation of viscous (steady-state) flows around full aircraft within a few hours. For future even more complex applications, however, the computing times have to be reduced further. A very promising approach is to use adaptive grids. In this section we describe the idea of this approach and show that it can be combined with multigrid in a natural and straightforward way. The problem of defining appropriate refinement criteria and the problems of the parallelization of block-structured multigrid are discussed briefly.
The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM p... more The aim of the PRISM coupling and I/O system, developed in the framework of the EU funded PRISM project, is to provide a portable, efficient and easy-to-use open source software package, which includes a concise appli- cation programmer interface (API) to manage the coupling of arbitrary climate component models as well as the I/O of each individual component. In this arti- cle we will focus on the way the PRISM coupler drives the whole coupled model, ensuring the synchronization of the different component models, the exchange of the coupling fields directly between the components or via additional transforma- tion processes, and I/O actions from/to files.
International Journal for Numerical Methods in Fluids, 1996
A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a fl... more A collocated discretization of the 3D steady incompressible Navier-Stokes equations based on a flux-difference-splitting formulation is presented. The discretization employs primitive variables of Cartesian velocity components and pressure. The splitting used here is a polynomial splitting introduced by Dick and Linden of Roe type. Second-order accuracy is obtained with the defect correction approach in which the state vector is inter-polated
The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler;... more The SX-9 provides FORTRAN90/SX and C++/SX, respectively, a Fortran compiler and a C/C++ compiler; both of which feature excellent optimization, vectorization and parallelization functions. HPF/SX V2, (a compiler for HPF (High Performance Fortran), which is the de facto standard language for distributed parallel processing), and MPI/SX and MPI2/SX, (fully compliant with the distributed parallel processing interfaces MPI-1.3 and MPI-2.1 specifications) are also provided. This paper is intended to introduce the functions and features of the speed-up technology adopted in these programming interfaces for the SX-9.
The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also know... more The MPI-2 Standard, released in 1997, defined an interface for one-sided communication, also known as remote memory access (RMA). It was designed with the goal that it should permit efficient implementations on multiple platforms and networking technologies, and also in heterogeneous environments and non-cache-coherent systems. Nonetheless, even 12 years after its existence, the MPI-2 RMA interface remains scarcely used for a number of reasons. This paper discusses the limitations of the MPI-2 RMA specification, outlines the goals and requirements for a new RMA API that would better meet the needs of both users and implementers, and presents a strawman proposal for such an API. We also study the tradeoffs facing the design of this new API and discuss how it may be implemented efficiently on both cache-coherent and non-cache-coherent systems.
Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file ... more Abstract. We present a fundamental improvement of the generic techniques for non-contiguous file access in MPI-IO. The improvement consists in the replacement of the conventional data management algorithms based on a representation of the non-contiguous fileview as a list of 〈offset, length 〉 tuples. The improvement is termed listless i/o as it instead makes use of space- and time-efficient datatype handling functionality that is completely free of lists for processing non-contiguous data in the file or in memory. Listless i/o has been implemented for both independent and collective file accesses and improves access performance by increasing the data throughput between user buffers and file buffers. Additionally, it reduces the memory footprint of the process performing non-contiguous I/O. In this paper we give results for a synthetic benchmark on a PC cluster using different file systems. We demonstrate improvements in I/O bandwidth that exceed a factor of 10. 1
We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote M... more We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote Memory Access) for the NEC SX-5 vector supercomputer. MPI/SX is a non-threaded implementation of the full MPI-2 standard. Essential features of the implementation are presented, including the synchronization mechanisms, the handling of communication windows in global shared and in process local memory, as well as the handling of MPI derived datatypes. In comparative benchmarks the data transfer operations for one-sided communication and point-to-point message passing show very similar performance, both when data reside in global shared and when in process local memory. Derived datatypes, which are of particular importance for applications using one-sided communications, impose only a modest overhead and can be used without any significant loss of performance. Thus, the MPI/SX programmer can freely choose either the message passing or the one-sided communication model, whichever i...
A general software package has been developed for solving systems of partial diierential equation... more A general software package has been developed for solving systems of partial diierential equations with adaptive multigrid methods (MLAT) on distributed memory computers. The package supports the dynamic mapping of reene-ment levels. The general strategy is described and results are reported on compute-intensive problems as well as on some simple problems representing worst-case situations from a parallel eeciency point of view. Inherent limitations of the parallel eeciency will be discussed.
Nec Research Development, 1998
Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface ... more Since its publication in 1994, MPI-1 (Message Passing Interface) has been the standard interface for message passing in parallel applications. At the C&C Research Laboratory at Sankt Augustin, Germany, efficient MPI implementations are under development, including the product version for the SX-4 vector supercomputer. In the beginning, highest priority was given to the optimization of latency and throughput for point-to-point communication, followed by the development of shared-memory based collective operations. Now, with customers asking for heterogeneous couplings between different parallel systems, the provision of MPI implementations for such configurations has become a requirement. At the same time, work is starting on the extended MIPI-2 standard, the definition of which was published in 1997. This article presents the current status of MPI library software at NEC and gives an outlook on future activities.
Nec Research Development, 2003
Résumé/Abstract MPI is the standard communication interface for programming parallel applications... more Résumé/Abstract MPI is the standard communication interface for programming parallel applications in the message passing paradigm. MPI/SX is a dedicated, efficient and highly optimized implementation of the full MPI-2 standard for the NEC SX-series of parallel ...
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Parallel Computational Fluid Dynamics 1995, 1996