Implicit and explicit optimizations for stencil computations (original) (raw)

SIAM Review, 2009

Impact of modern memory subsystems on cache optimizations for stencil computations

Proceedings of the 2005 workshop on Memory system performance - MSP '05, 2005

Understanding stencil code performance on multicore architectures

ahmed qasem

Proceedings of the 8th ACM International …, 2011

Towards a MultiLevel Cache Performance Model for 3D Stencil Computation

Raul Araya

Procedia Computer Science, 2011

Cache oblivious parallelograms in iterative stencil computations

Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10, 2010

NUMA Aware Iterative Stencil Computations on Many-Core Systems

2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

2008

Impact of System and Cache Bandwidth on Stencil Computations Across Multiple Processor Generations

mpi-inf.mpg.de

L. Gan, H. Fu, W. Xue, Y. Xu, C. Yang, X. Wang, Z. Lv, Yang You, G. Yang, and K. Ou. Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures. The 20thIEEE International Conference on Parallel and Distributed Systems (ICPADS 2014)

Yang You

Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

Jairo Panetta

Concurrency and Computation: Practice and Experience, 2018

Efficient Acceleration of Stencil Applications through In-Memory Computing

Ahmed Eltawil

Micromachines

Reducing redundancy in data organization and arithmetic calculation for stencil computations

Liang Yuan

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021

3.5-D blocking optimization for stencil computations on modern CPUs and GPUs

Pradeep Dubey

Auto-tuning Stencil Computations on Multicore and Accelerators

J. Shalf

Exploiting memory customization in FPGA for 3D stencil computations

Nacho Amir

2009

Effective automatic parallelization of stencil computations

Uday Bondhugula

2007

Parallel data-locality aware stencil computations on modern micro-architectures

Helmar Burkhart

2009 IEEE International Symposium on Parallel & Distributed Processing, 2009

Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory System

Jason Bakos

ACM Transactions on Reconfigurable Technology and Systems, 2015

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Helmar Burkhart

Computer Science - Research and Development, 2011

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Helmar Burkhart

2011 IEEE International Parallel & Distributed Processing Symposium, 2011

Understanding the performance of stencil computations on Intel's Xeon Phi

Roy Campbell

2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013

Locality aware concurrent start for stencil applications

A. Marquez

Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

Jairo Panetta

Anais do Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018

Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel

Talita Perciano

2021

Cache Accurate Time Skewing in Iterative Stencil Computations

2011 International Conference on Parallel Processing, 2011

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Rodrigo Rocha

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017

An auto-tuning framework for parallel multicore stencil computations

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Roman Wyrzykowski

The Journal of Supercomputing

Locally-Oriented Programming: A Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures

Craig Rasmussen

2015

A generalized framework for auto-tuning stencil computations

Cy P Chan

Proceedings of the …, 2009