Implicit and explicit optimizations for stencil computations (original) (raw)

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Leonid Oliker

SIAM Review, 2009

View PDFchevron_right

Impact of modern memory subsystems on cache optimizations for stencil computations

Leonid Oliker

Proceedings of the 2005 workshop on Memory system performance - MSP '05, 2005

View PDFchevron_right

Understanding stencil code performance on multicore architectures

ahmed qasem

Proceedings of the 8th ACM International …, 2011

View PDFchevron_right

Towards a MultiLevel Cache Performance Model for 3D Stencil Computation

Raul Araya

Procedia Computer Science, 2011

View PDFchevron_right

Cache oblivious parallelograms in iterative stencil computations

mohammed shaheen

Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10, 2010

View PDFchevron_right

NUMA Aware Iterative Stencil Computations on Many-Core Systems

mohammed shaheen

2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012

View PDFchevron_right

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Leonid Oliker

2008

View PDFchevron_right

Impact of System and Cache Bandwidth on Stencil Computations Across Multiple Processor Generations

mohammed shaheen

mpi-inf.mpg.de

View PDFchevron_right

L. Gan, H. Fu, W. Xue, Y. Xu, C. Yang, X. Wang, Z. Lv, Yang You, G. Yang, and K. Ou. Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures. The 20thIEEE International Conference on Parallel and Distributed Systems (ICPADS 2014)

Yang You

View PDFchevron_right

Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

Jairo Panetta

Concurrency and Computation: Practice and Experience, 2018

View PDFchevron_right

Efficient Acceleration of Stencil Applications through In-Memory Computing

Ahmed Eltawil

Micromachines

View PDFchevron_right

Reducing redundancy in data organization and arithmetic calculation for stencil computations

Liang Yuan

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021

View PDFchevron_right

3.5-D blocking optimization for stencil computations on modern CPUs and GPUs

Pradeep Dubey

View PDFchevron_right

Auto-tuning Stencil Computations on Multicore and Accelerators

J. Shalf

View PDFchevron_right

Exploiting memory customization in FPGA for 3D stencil computations

Nacho Amir

2009

View PDFchevron_right

Effective automatic parallelization of stencil computations

Uday Bondhugula

2007

View PDFchevron_right

Parallel data-locality aware stencil computations on modern micro-architectures

Helmar Burkhart

2009 IEEE International Symposium on Parallel & Distributed Processing, 2009

View PDFchevron_right

Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory System

Jason Bakos

ACM Transactions on Reconfigurable Technology and Systems, 2015

View PDFchevron_right

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Helmar Burkhart

Computer Science - Research and Development, 2011

View PDFchevron_right

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Helmar Burkhart

2011 IEEE International Parallel & Distributed Processing Symposium, 2011

View PDFchevron_right

Understanding the performance of stencil computations on Intel's Xeon Phi

Roy Campbell

2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013

View PDFchevron_right

Locality aware concurrent start for stencil applications

A. Marquez

View PDFchevron_right

Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

Jairo Panetta

Anais do Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018

View PDFchevron_right

Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel

Talita Perciano

2021

View PDFchevron_right

Cache Accurate Time Skewing in Iterative Stencil Computations

mohammed shaheen

2011 International Conference on Parallel Processing, 2011

View PDFchevron_right

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Rodrigo Rocha

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017

View PDFchevron_right

An auto-tuning framework for parallel multicore stencil computations

Leonid Oliker

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

View PDFchevron_right

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Roman Wyrzykowski

The Journal of Supercomputing

View PDFchevron_right

Locally-Oriented Programming: A Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures

Craig Rasmussen

2015

View PDFchevron_right

A generalized framework for auto-tuning stencil computations

Cy P Chan

Proceedings of the …, 2009

View PDFchevron_right