Luiz Ramos - Academia.edu (original) (raw)
Uploads
Papers by Luiz Ramos
Concurrency and Computation: Practice and Experience, 2015
The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing mom... more The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing momentum in recent years. Unfortunately, GPU-programming platforms like CUDA are complex, user unfriendly, and increase the complexity of developing high-performance parallel applications. In addition, runtime systems that execute those applications often fail to fully utilize the parallelism of modern CPU-GPU systems. Typically, parallel kernels run entirely on the most powerful device available, leaving other devices idle. These observations sparked research in two directions: (1) high-level approaches to software development for GPUs, which strike a balance between performance and ease of programming; and (2) task partitioning to fully utilize the available devices. In this paper, we propose a framework, called PSkel, that provides a single high-level abstraction for stencil programming on heterogeneous CPU-GPU systems, while allowing the programmer to partition and assign data and computation to both CPU and GPU. Our current implementation uses parallel skeletons to transparently leverage Intel TBB and NVIDIA CUDA. In our experiments, we observed parallel applications with task partitioning can improve average performance by up to 76% and 28% compared to CPU-only and GPU-only parallel applications, respectively.
Concurrency and Computation: Practice and Experience, 2015
The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing mom... more The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing momentum in recent years. Unfortunately, GPU-programming platforms like CUDA are complex, user unfriendly, and increase the complexity of developing high-performance parallel applications. In addition, runtime systems that execute those applications often fail to fully utilize the parallelism of modern CPU-GPU systems. Typically, parallel kernels run entirely on the most powerful device available, leaving other devices idle. These observations sparked research in two directions: (1) high-level approaches to software development for GPUs, which strike a balance between performance and ease of programming; and (2) task partitioning to fully utilize the available devices. In this paper, we propose a framework, called PSkel, that provides a single high-level abstraction for stencil programming on heterogeneous CPU-GPU systems, while allowing the programmer to partition and assign data and computation to both CPU and GPU. Our current implementation uses parallel skeletons to transparently leverage Intel TBB and NVIDIA CUDA. In our experiments, we observed parallel applications with task partitioning can improve average performance by up to 76% and 28% compared to CPU-only and GPU-only parallel applications, respectively.