Accelerator Codesign as Non-Linear Optimization (original) (raw)
Related papers
Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs
Anais do Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
2008
Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
Concurrency and Computation: Practice and Experience, 2018
An analytical GPU performance model for 3D stencil computations from the angle of data traffic
The Journal of Supercomputing, 2015
An auto-tuning framework for parallel multicore stencil computations
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
Simple, Accurate, Analytical Time Modeling and Optimal Tile Size Selection for GPGPU Stencils
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
Autotuning stencil-based computations on GPUs
Auto-tuning Stencil Computations on Multicore and Accelerators
PSkel: A stencil programming framework for CPU-GPU systems
Concurrency and Computation: Practice and Experience, 2015
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
SIAM Review, 2009
Exploration of automatic optimization for CUDA programming
… and Grid Computing (PDGC), 2012 2nd …, 2012
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International …, 2011
Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations
2014 IEEE International Conference on Cluster Computing (CLUSTER), 2014
Understanding the performance of stencil computations on Intel's Xeon Phi
2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013
Automatic Partitioning of Stencil Computations on Heterogeneous Systems
2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017
Mint: Realizing CUDA performance in 3D stencil methods with annotated C
Proceedings of the International Conference on Supercomputing, 2011
Resource Conscious Reuse-Driven Tiling for GPUs
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
Effective resource management for enhancing performance of 2D and 3D stencils on GPUs
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016
Efficient Acceleration of Stencil Applications through In-Memory Computing
Micromachines
Implicit and explicit optimizations for stencil computations
Proceedings of the 2006 workshop on Memory system performance and correctness - MSPC '06, 2006
Stencil-Aware GPU Optimization of Iterative Solvers
Daniel Lowell, Chekuri Choudary, Jeswin Godwin, J. Holewinski
SIAM Journal on Scientific Computing, 2013
2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013
Application-independent Autotuning for GPUs
M. Tillmann, Thomas Karcher, Walter Tichy
A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
Journal of Computer Science and Technology, 2012
A performance study for iterative stencil loops on GPUs with ghost zone optimizations
International Journal of Parallel Programming, 2011
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Proceedings of the 23rd international conference …, 2009
Towards a MultiLevel Cache Performance Model for 3D Stencil Computation
Procedia Computer Science, 2011
Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework
Concurrency and Computation: Practice and Experience, 2017
Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems
2009