Accelerator Codesign as Non-Linear Optimization (original) (raw)

Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

Jairo Panetta

Anais do Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018

View PDFchevron_right

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Leonid Oliker

2008

View PDFchevron_right

Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

Jairo Panetta

Concurrency and Computation: Practice and Experience, 2018

View PDFchevron_right

L. Gan, H. Fu, W. Xue, Y. Xu, C. Yang, X. Wang, Z. Lv, Yang You, G. Yang, and K. Ou. Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures. The 20thIEEE International Conference on Parallel and Distributed Systems (ICPADS 2014)

Yang You

View PDFchevron_right

An analytical GPU performance model for 3D stencil computations from the angle of data traffic

Xing Cai

The Journal of Supercomputing, 2015

View PDFchevron_right

An auto-tuning framework for parallel multicore stencil computations

Leonid Oliker

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

View PDFchevron_right

Simple, Accurate, Analytical Time Modeling and Optimal Tile Size Selection for GPGPU Stencils

Waruna Ranasinghe

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

View PDFchevron_right

Autotuning stencil-based computations on GPUs

Boyana Norris

View PDFchevron_right

Auto-tuning Stencil Computations on Multicore and Accelerators

J. Shalf

View PDFchevron_right

PSkel: A stencil programming framework for CPU-GPU systems

Luiz Ramos

Concurrency and Computation: Practice and Experience, 2015

View PDFchevron_right

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Leonid Oliker

SIAM Review, 2009

View PDFchevron_right

Exploration of automatic optimization for CUDA programming

Ayaz H Khan

… and Grid Computing (PDGC), 2012 2nd …, 2012

View PDFchevron_right

Understanding stencil code performance on multicore architectures

ahmed qasem

Proceedings of the 8th ACM International …, 2011

View PDFchevron_right

Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations

Guanghao Jin

2014 IEEE International Conference on Cluster Computing (CLUSTER), 2014

View PDFchevron_right

Understanding the performance of stencil computations on Intel's Xeon Phi

Roy Campbell

2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013

View PDFchevron_right

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Rodrigo Rocha

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017

View PDFchevron_right

Mint: Realizing CUDA performance in 3D stencil methods with annotated C

Xing Cai

Proceedings of the International Conference on Supercomputing, 2011

View PDFchevron_right

Resource Conscious Reuse-Driven Tiling for GPUs

Changwan Hong

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

View PDFchevron_right

Effective resource management for enhancing performance of 2D and 3D stencils on GPUs

Changwan Hong

Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016

View PDFchevron_right

Efficient Acceleration of Stencil Applications through In-Memory Computing

Ahmed Eltawil

Micromachines

View PDFchevron_right

Implicit and explicit optimizations for stencil computations

Leonid Oliker

Proceedings of the 2006 workshop on Memory system performance and correctness - MSPC '06, 2006

View PDFchevron_right

Stencil-Aware GPU Optimization of Iterative Solvers

Daniel Lowell, Chekuri Choudary, Jeswin Godwin, J. Holewinski

SIAM Journal on Scientific Computing, 2013

View PDFchevron_right

A parallel optimization method for stencil computation on the domain that is bigger than memory capacity of GPUs

Guanghao Jin

2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013

View PDFchevron_right

Application-independent Autotuning for GPUs

M. Tillmann, Thomas Karcher, Walter Tichy

View PDFchevron_right

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

Jingling Xue

Journal of Computer Science and Technology, 2012

View PDFchevron_right

A performance study for iterative stencil loops on GPUs with ghost zone optimizations

Jiayuan Meng

International Journal of Parallel Programming, 2011

View PDFchevron_right

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Jiayuan Meng

Proceedings of the 23rd international conference …, 2009

View PDFchevron_right

Towards a MultiLevel Cache Performance Model for 3D Stencil Computation

Raul Araya

Procedia Computer Science, 2011

View PDFchevron_right

Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework

Marcos Amaris González

Concurrency and Computation: Practice and Experience, 2017

View PDFchevron_right

Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems

Richard (Rich) Vuduc

2009

View PDFchevron_right