Implicit and explicit optimizations for stencil computations (original) (raw)
Related papers
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
SIAM Review, 2009
Impact of modern memory subsystems on cache optimizations for stencil computations
Proceedings of the 2005 workshop on Memory system performance - MSP '05, 2005
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International …, 2011
Towards a MultiLevel Cache Performance Model for 3D Stencil Computation
Procedia Computer Science, 2011
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10, 2010
NUMA Aware Iterative Stencil Computations on Many-Core Systems
2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
2008
Impact of System and Cache Bandwidth on Stencil Computations Across Multiple Processor Generations
mpi-inf.mpg.de
Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
Concurrency and Computation: Practice and Experience, 2018
Efficient Acceleration of Stencil Applications through In-Memory Computing
Micromachines
Reducing redundancy in data organization and arithmetic calculation for stencil computations
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021
3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
Auto-tuning Stencil Computations on Multicore and Accelerators
Exploiting memory customization in FPGA for 3D stencil computations
2009
Effective automatic parallelization of stencil computations
2007
Parallel data-locality aware stencil computations on modern micro-architectures
2009 IEEE International Symposium on Parallel & Distributed Processing, 2009
Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory System
ACM Transactions on Reconfigurable Technology and Systems, 2015
Automatic code generation and tuning for stencil kernels on modern shared memory architectures
Computer Science - Research and Development, 2011
2011 IEEE International Parallel & Distributed Processing Symposium, 2011
Understanding the performance of stencil computations on Intel's Xeon Phi
2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013
Locality aware concurrent start for stencil applications
Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs
Anais do Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018
Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel
2021
Cache Accurate Time Skewing in Iterative Stencil Computations
2011 International Conference on Parallel Processing, 2011
Automatic Partitioning of Stencil Computations on Heterogeneous Systems
2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017
An auto-tuning framework for parallel multicore stencil computations
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
The Journal of Supercomputing
2015
A generalized framework for auto-tuning stencil computations
Proceedings of the …, 2009