David Padua - Academia.edu (original) (raw)

Papers by David Padua

Research paper thumbnail of Speculative run-time parallelization of loops

Research paper thumbnail of Languages and Compilers for Parallel Computing

Lecture Notes in Computer Science, 2013

Research paper thumbnail of Design and use of htalib–a library for hierarchically tiled arrays

Research paper thumbnail of A Run-Time Technique for DOALL Loop Identification and Array Privatization

Research paper thumbnail of Parallel programming with Polaris

Research paper thumbnail of Hierarchically Tiled Array Vs. Intel Thread Building Blocks for Multicore Systems Programming

Multicore systems are becoming common, while programmers cannot rely on growing clock rate to spe... more Multicore systems are becoming common, while programmers cannot rely on growing clock rate to speed up their application. Thus, software developers are increasingly exposed to the complexity associated with programming parallel shared memory environments. Intel Threading Building Blocks (TBBs) is a library which facilitates the programming of this kind of system. The key notion is to separate logical task patterns, which are easy to understand, from physical threads, and delegate the scheduling of the tasks to the system. On the other hand, Hierarchically Tiled Arrays (HTAs) are data structures that facilitate locality and parallelism of array intensive computations with block-recursive nature. The model underlying HTAs provides programmers with a single-threaded view of the execution. The HTA implementation in C++ has been recently extended to support multicore machines. In this work we implement several algorithms using both libraries in order to compare the ease of programming and the relative performance of both approaches.

Research paper thumbnail of Task-Parallel versus Data-Parallel Library-Based Programming in Multicore Systems

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009

Multicore machines are becoming common. There are many languages, language extensions and librari... more Multicore machines are becoming common. There are many languages, language extensions and libraries devoted to improve the programmability and performance of these machines. In this paper we compare two libraries, that face the problem of programming multicores from two different perspectives, task parallelism and data parallelism. The Intel Threading Building Blocks (TBB) library separates logical task patterns, which are easy to understand, from physical threads, and delegates the scheduling of the tasks to the system. On the other hand, Hierarchically Tiled Arrays (HTAs) are data structures that facilitate locality and parallelism of array intensive computations with a block-recursive nature following a data-parallel paradigm. Our comparison considers both ease of programming and the performance obtained using both approaches. In our experience, HTA programs tend to be smaller or as long as TBB programs, while performance of both approaches is very similar.

Research paper thumbnail of Retrospective: the Cedar system

International Symposium on Computer Architecture, 1998

Research paper thumbnail of Advanced program restructuring for high-performance computers with Polaris

Multiprocessor computers are rapidly becoming the norm. Parallel workstations are widely availabl... more Multiprocessor computers are rapidly becoming the norm. Parallel workstations are widely available today and it is likely that most PCs in the near future will also be parallel. To accommodate these changes, some classes of applications must be developed in explicitly parallel form. Yet, in order to avoid a substantial increase in software development costs, compilers that translate conventional programs into efficient parallel form clearly will be necessary. In the ideal case, multiprocessor parallelism should be as transparent to programmers as functional level parallelism is to programmers of today's superscalar machines. However, compiling for multiprocessors is substantially more complex than compiling for functional unit parallelism, in part because successful parallelization often requires a very accurate analysis of long sections of code. This article discusses recent experience at Illinois on the automatic parallelization of scientific codes using the Polaris restructurer. Also, the article presents several new analysis techniques that we have developed in recent years based on an extensive analysis of the characteristics of real Fortran codes. These techniques, which are based on both static and dynamic parallelization strategies, have been incorporated in the Polaris restructurer. Preliminary results on parallel workstations are encouraging and, once the implementation of the new techniques is complete, we expect that Polaris will be able to obtain good speedups for most scientific codes on parallel workstations.

Research paper thumbnail of Programming for parallelism and locality with hierarchically tiled arrays

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06, 2006

Tiling has proven to be an effective mechanism to develop high performance implementations of alg... more Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or sequential components of parallel programs is enhanced.

Research paper thumbnail of Run-Time Methods for Parallelizing DO Loops

Research paper thumbnail of Speculative run-time parallelization of loops

Research paper thumbnail of Languages and Compilers for Parallel Computing

Lecture Notes in Computer Science, 2013

Research paper thumbnail of Design and use of htalib–a library for hierarchically tiled arrays

Research paper thumbnail of A Run-Time Technique for DOALL Loop Identification and Array Privatization

Research paper thumbnail of Parallel programming with Polaris

Research paper thumbnail of Hierarchically Tiled Array Vs. Intel Thread Building Blocks for Multicore Systems Programming

Multicore systems are becoming common, while programmers cannot rely on growing clock rate to spe... more Multicore systems are becoming common, while programmers cannot rely on growing clock rate to speed up their application. Thus, software developers are increasingly exposed to the complexity associated with programming parallel shared memory environments. Intel Threading Building Blocks (TBBs) is a library which facilitates the programming of this kind of system. The key notion is to separate logical task patterns, which are easy to understand, from physical threads, and delegate the scheduling of the tasks to the system. On the other hand, Hierarchically Tiled Arrays (HTAs) are data structures that facilitate locality and parallelism of array intensive computations with block-recursive nature. The model underlying HTAs provides programmers with a single-threaded view of the execution. The HTA implementation in C++ has been recently extended to support multicore machines. In this work we implement several algorithms using both libraries in order to compare the ease of programming and the relative performance of both approaches.

Research paper thumbnail of Task-Parallel versus Data-Parallel Library-Based Programming in Multicore Systems

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009

Multicore machines are becoming common. There are many languages, language extensions and librari... more Multicore machines are becoming common. There are many languages, language extensions and libraries devoted to improve the programmability and performance of these machines. In this paper we compare two libraries, that face the problem of programming multicores from two different perspectives, task parallelism and data parallelism. The Intel Threading Building Blocks (TBB) library separates logical task patterns, which are easy to understand, from physical threads, and delegates the scheduling of the tasks to the system. On the other hand, Hierarchically Tiled Arrays (HTAs) are data structures that facilitate locality and parallelism of array intensive computations with a block-recursive nature following a data-parallel paradigm. Our comparison considers both ease of programming and the performance obtained using both approaches. In our experience, HTA programs tend to be smaller or as long as TBB programs, while performance of both approaches is very similar.

Research paper thumbnail of Retrospective: the Cedar system

International Symposium on Computer Architecture, 1998

Research paper thumbnail of Advanced program restructuring for high-performance computers with Polaris

Multiprocessor computers are rapidly becoming the norm. Parallel workstations are widely availabl... more Multiprocessor computers are rapidly becoming the norm. Parallel workstations are widely available today and it is likely that most PCs in the near future will also be parallel. To accommodate these changes, some classes of applications must be developed in explicitly parallel form. Yet, in order to avoid a substantial increase in software development costs, compilers that translate conventional programs into efficient parallel form clearly will be necessary. In the ideal case, multiprocessor parallelism should be as transparent to programmers as functional level parallelism is to programmers of today's superscalar machines. However, compiling for multiprocessors is substantially more complex than compiling for functional unit parallelism, in part because successful parallelization often requires a very accurate analysis of long sections of code. This article discusses recent experience at Illinois on the automatic parallelization of scientific codes using the Polaris restructurer. Also, the article presents several new analysis techniques that we have developed in recent years based on an extensive analysis of the characteristics of real Fortran codes. These techniques, which are based on both static and dynamic parallelization strategies, have been incorporated in the Polaris restructurer. Preliminary results on parallel workstations are encouraging and, once the implementation of the new techniques is complete, we expect that Polaris will be able to obtain good speedups for most scientific codes on parallel workstations.

Research paper thumbnail of Programming for parallelism and locality with hierarchically tiled arrays

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06, 2006

Tiling has proven to be an effective mechanism to develop high performance implementations of alg... more Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or sequential components of parallel programs is enhanced.

Research paper thumbnail of Run-Time Methods for Parallelizing DO Loops