W. Gropp | University of Illinois at Urbana-Champaign (original) (raw)
Papers by W. Gropp
... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of I... more ... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of Illinois at Urbana-Champaign. ... An extreme scale system is one that is one thousand times more capable than a current comparable system, with the same power and physical footprint. ...
An extreme scale system is one that is one thousand times more capable than a current comparable ... more An extreme scale system is one that is one thousand times more capable than a current comparable system, with the same power and physical footprint. Intuitively, this means that the power consumption and physical footprint of a current departmental server should be enough to deliver petascale performance, and that a single, commodity chip should deliver terascale performance. In this panel, we will discuss the resulting challenges in energy/power efficiency, concurrency and locality, resiliency and programmability, and the research opportunities that may take us to extreme scale systems.
IEEE Computer Society eBooks, 2002
This collection of 45 papers and 13 posters from the September 2002 conference focuses on the sof... more This collection of 45 papers and 13 posters from the September 2002 conference focuses on the software and hardware that will enable cluster computing. The researchers discuss task management, network hardware, programming clusters, and scalable clusters. Among the topics are experience in offloadin"
27th Annual IEEE Conference on Local Computer Networks, 2002. Proceedings. LCN 2002.
Lecture Notes in Computer Science, 2002
The Message Passing Interface (MPI) standard for programming parallel computers is widely used fo... more The Message Passing Interface (MPI) standard for programming parallel computers is widely used for building both programs and libraries. Two of the strengths of MPI are its support for libraries and the existence of multiple implementations on many platforms. These two strengths are in conflict, however, when an application wants to use libraries built with different MPI implementations. This paper describes several solutions to this problem, based on minor changes to the API. These solutions also suggest design considerations for other standards, particularly those that expect to have multiple implementatations and to be used in concert with other libraries.
We present a simple auto-tuning method to improve the performance of sparse matrix-vector multipl... more We present a simple auto-tuning method to improve the performance of sparse matrix-vector multiply (SpMV) on a GPU. The sparse matrix, stored in CSR format, is sorted in increasing order of the number of nonzero elements per row and partitioned into several ranges. The number of GPU threads per row (TPR) is then assigned for different ranges of the matrix rows to balance the workload for the GPU threads. Tests show that the method provides good performance for most of the matrices tested, compared to the NVIDIA sparse package. The auto-tuning approach is easy to implement, the tuning process is fast, and it is not necessary to convert the matrices into different formats and try them one by one to determine the best format for the matrix, as in some other approaches for this problem.
Computers & Mathematics with Applications, 1997
BOOK REPORTS policies without measuring merits (P. Dayan and S.P. Singh). Memory-based stochastic... more BOOK REPORTS policies without measuring merits (P. Dayan and S.P. Singh). Memory-based stochastic optimization (A.W. Moore and J. Schneider). Temporal difference in learning in continuous time and space (K. Doya). Reinforcement learning by probability matching (P.N. Sabes and M.I. Jordan). Author index. Keyword index.
Computers & Mathematics with Applications, 1995
Mathematics and Computer Science …
The recent discovery of superconductivity in a class of copper-oxide compounds (the cuprate super... more The recent discovery of superconductivity in a class of copper-oxide compounds (the cuprate superconductors) at liquid nitrogen temperatures has generated a renewed interest in the magnetic properties of type-I1 superconductors. In our work, we are investigating these properties using, *e phenomenological time-dependent Ginzburg-Landau equation. In this paper we describe the pardelization of this equation. , , The nonequilibrium superconducting state is described by the TDGL equation governing the behavior of the complex order parameter (r,t)andthemagneticvectorpotentialA(r,t).Startingwithadimensionless,gaugeinvariantfree−energyfunctionaltheequationofmotionfor(r,t) and the magnetic vector potential A(r, t). Starting with a dimensionless, gaugeinvariant free-energy functional the equation of motion for (r,t)andthemagneticvectorpotentialA(r,t).Startingwithadimensionless,gaugeinvariantfree−energyfunctionaltheequationofmotionfor is based on a vari-This work was supported by the Office of Scientific Computing, U.S. Department of Energy, under Contract W-31-109-Eng-38. gauge field is given by
The MIT Press eBooks, 1999
Proceedings of Scalable Parallel Libraries Conference
... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, ... more ... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, CM-5) while supporting others (KSR, Ncube, Sequent)) along with networks of work-stations, via p4. A schematic diagram is shown in Figure 1. [q'j ... [lo] RJ Harrison. ...
Lecture Notes in Computer Science
Formal verification of programs often requires creating a model of the program and running it thr... more Formal verification of programs often requires creating a model of the program and running it through a model-checking tool. However, this model-creation step is itself error prone, tedious, and difficult to do for someone not familiar with formal verification. In this paper, we describe a tool for verifying correctness of MPI programs that does not require the creation of a model and instead works directly on the MPI program. Such a tool is useful in the hands of average MPI programmers. Our tool uses the MPI profiling interface, PMPI, to trap MPI calls and hand over control of the MPI function execution to a scheduler. The scheduler verifies correctness of the program by executing all "relevant" interleavings of the program. The scheduler records an initial trace and replays its interleaving variants by using dynamic partial-order reduction. We describe the design and implementation of the tool and compare it with our previous work based on model checking.
Lecture Notes in Computer Science, 2008
MPI-2 introduced many new capabilities, including dynamic process management, one-sided communica... more MPI-2 introduced many new capabilities, including dynamic process management, one-sided communication, and parallel I/O. Implementations of these features are becoming widespread. This tutorial shows how to use these features by showing all of the steps involved in designing, coding, and tuning solutions to specific problems. The problems are chosen for their practical use in applications as well as for their ability to illustrate specific MPI-2 topics. Complete examples that illustrate the use of MPI one-sided communication, MPI parallel I/O, and hybrid programming with MPI and threads will be discussed and full source code will be made available to the attendees. Each example will include a hands-on lab session; these sessions will also introduce the use of performance and correctness debugging tools that are available for the MPI environment. Guidance on tuning MPI programs will be included, with examples and data from MPI implementations on a variety of parallel systems, including Sun, IBM, SGI, and clusters. Examples in C, Fortran, and C++ will be included. Familiarity with basic MPI usage will be assumed.
... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of I... more ... Torrellas, Josep; Gropp, Bill; Sarkar, Vivek; Moreno, Jaime; Olukotun, Kunle; University of Illinois at Urbana-Champaign. ... An extreme scale system is one that is one thousand times more capable than a current comparable system, with the same power and physical footprint. ...
An extreme scale system is one that is one thousand times more capable than a current comparable ... more An extreme scale system is one that is one thousand times more capable than a current comparable system, with the same power and physical footprint. Intuitively, this means that the power consumption and physical footprint of a current departmental server should be enough to deliver petascale performance, and that a single, commodity chip should deliver terascale performance. In this panel, we will discuss the resulting challenges in energy/power efficiency, concurrency and locality, resiliency and programmability, and the research opportunities that may take us to extreme scale systems.
IEEE Computer Society eBooks, 2002
This collection of 45 papers and 13 posters from the September 2002 conference focuses on the sof... more This collection of 45 papers and 13 posters from the September 2002 conference focuses on the software and hardware that will enable cluster computing. The researchers discuss task management, network hardware, programming clusters, and scalable clusters. Among the topics are experience in offloadin"
27th Annual IEEE Conference on Local Computer Networks, 2002. Proceedings. LCN 2002.
Lecture Notes in Computer Science, 2002
The Message Passing Interface (MPI) standard for programming parallel computers is widely used fo... more The Message Passing Interface (MPI) standard for programming parallel computers is widely used for building both programs and libraries. Two of the strengths of MPI are its support for libraries and the existence of multiple implementations on many platforms. These two strengths are in conflict, however, when an application wants to use libraries built with different MPI implementations. This paper describes several solutions to this problem, based on minor changes to the API. These solutions also suggest design considerations for other standards, particularly those that expect to have multiple implementatations and to be used in concert with other libraries.
We present a simple auto-tuning method to improve the performance of sparse matrix-vector multipl... more We present a simple auto-tuning method to improve the performance of sparse matrix-vector multiply (SpMV) on a GPU. The sparse matrix, stored in CSR format, is sorted in increasing order of the number of nonzero elements per row and partitioned into several ranges. The number of GPU threads per row (TPR) is then assigned for different ranges of the matrix rows to balance the workload for the GPU threads. Tests show that the method provides good performance for most of the matrices tested, compared to the NVIDIA sparse package. The auto-tuning approach is easy to implement, the tuning process is fast, and it is not necessary to convert the matrices into different formats and try them one by one to determine the best format for the matrix, as in some other approaches for this problem.
Computers & Mathematics with Applications, 1997
BOOK REPORTS policies without measuring merits (P. Dayan and S.P. Singh). Memory-based stochastic... more BOOK REPORTS policies without measuring merits (P. Dayan and S.P. Singh). Memory-based stochastic optimization (A.W. Moore and J. Schneider). Temporal difference in learning in continuous time and space (K. Doya). Reinforcement learning by probability matching (P.N. Sabes and M.I. Jordan). Author index. Keyword index.
Computers & Mathematics with Applications, 1995
Mathematics and Computer Science …
The recent discovery of superconductivity in a class of copper-oxide compounds (the cuprate super... more The recent discovery of superconductivity in a class of copper-oxide compounds (the cuprate superconductors) at liquid nitrogen temperatures has generated a renewed interest in the magnetic properties of type-I1 superconductors. In our work, we are investigating these properties using, *e phenomenological time-dependent Ginzburg-Landau equation. In this paper we describe the pardelization of this equation. , , The nonequilibrium superconducting state is described by the TDGL equation governing the behavior of the complex order parameter (r,t)andthemagneticvectorpotentialA(r,t).Startingwithadimensionless,gaugeinvariantfree−energyfunctionaltheequationofmotionfor(r,t) and the magnetic vector potential A(r, t). Starting with a dimensionless, gaugeinvariant free-energy functional the equation of motion for (r,t)andthemagneticvectorpotentialA(r,t).Startingwithadimensionless,gaugeinvariantfree−energyfunctionaltheequationofmotionfor is based on a vari-This work was supported by the Office of Scientific Computing, U.S. Department of Energy, under Contract W-31-109-Eng-38. gauge field is given by
The MIT Press eBooks, 1999
Proceedings of Scalable Parallel Libraries Conference
... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, ... more ... This provides an MPI implementation on a variety of parallel machines di-rectly (IBM, Intel, CM-5) while supporting others (KSR, Ncube, Sequent)) along with networks of work-stations, via p4. A schematic diagram is shown in Figure 1. [q'j ... [lo] RJ Harrison. ...
Lecture Notes in Computer Science
Formal verification of programs often requires creating a model of the program and running it thr... more Formal verification of programs often requires creating a model of the program and running it through a model-checking tool. However, this model-creation step is itself error prone, tedious, and difficult to do for someone not familiar with formal verification. In this paper, we describe a tool for verifying correctness of MPI programs that does not require the creation of a model and instead works directly on the MPI program. Such a tool is useful in the hands of average MPI programmers. Our tool uses the MPI profiling interface, PMPI, to trap MPI calls and hand over control of the MPI function execution to a scheduler. The scheduler verifies correctness of the program by executing all "relevant" interleavings of the program. The scheduler records an initial trace and replays its interleaving variants by using dynamic partial-order reduction. We describe the design and implementation of the tool and compare it with our previous work based on model checking.
Lecture Notes in Computer Science, 2008
MPI-2 introduced many new capabilities, including dynamic process management, one-sided communica... more MPI-2 introduced many new capabilities, including dynamic process management, one-sided communication, and parallel I/O. Implementations of these features are becoming widespread. This tutorial shows how to use these features by showing all of the steps involved in designing, coding, and tuning solutions to specific problems. The problems are chosen for their practical use in applications as well as for their ability to illustrate specific MPI-2 topics. Complete examples that illustrate the use of MPI one-sided communication, MPI parallel I/O, and hybrid programming with MPI and threads will be discussed and full source code will be made available to the attendees. Each example will include a hands-on lab session; these sessions will also introduce the use of performance and correctness debugging tools that are available for the MPI environment. Guidance on tuning MPI programs will be included, with examples and data from MPI implementations on a variety of parallel systems, including Sun, IBM, SGI, and clusters. Examples in C, Fortran, and C++ will be included. Familiarity with basic MPI usage will be assumed.