Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors (original) (raw)

Impact of modern memory subsystems on cache optimizations for stencil computations

Leonid Oliker

Proceedings of the 2005 workshop on Memory system performance - MSP '05, 2005

View PDFchevron_right

Implicit and explicit optimizations for stencil computations

Leonid Oliker

Proceedings of the 2006 workshop on Memory system performance and correctness - MSPC '06, 2006

View PDFchevron_right

Understanding stencil code performance on multicore architectures

ahmed qasem

Proceedings of the 8th ACM International …, 2011

View PDFchevron_right

Towards a MultiLevel Cache Performance Model for 3D Stencil Computation

Raul Araya

Procedia Computer Science, 2011

View PDFchevron_right

Impact of System and Cache Bandwidth on Stencil Computations Across Multiple Processor Generations

mohammed shaheen

mpi-inf.mpg.de

View PDFchevron_right

NUMA Aware Iterative Stencil Computations on Many-Core Systems

mohammed shaheen

2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012

View PDFchevron_right

L. Gan, H. Fu, W. Xue, Y. Xu, C. Yang, X. Wang, Z. Lv, Yang You, G. Yang, and K. Ou. Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures. The 20thIEEE International Conference on Parallel and Distributed Systems (ICPADS 2014)

Yang You

View PDFchevron_right

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Leonid Oliker

2008

View PDFchevron_right

Understanding the performance of stencil computations on Intel's Xeon Phi

Roy Campbell

2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013

View PDFchevron_right

Cache oblivious parallelograms in iterative stencil computations

mohammed shaheen

Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10, 2010

View PDFchevron_right

Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

Jairo Panetta

Concurrency and Computation: Practice and Experience, 2018

View PDFchevron_right

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Helmar Burkhart

Computer Science - Research and Development, 2011

View PDFchevron_right

L2 Cache Modeling for Scientific Applications on Chip Multi-Processors

Shirley Moore

2007 International Conference on Parallel Processing (ICPP 2007), 2007

View PDFchevron_right

Locality aware concurrent start for stencil applications

A. Marquez

View PDFchevron_right

Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel

Talita Perciano

2021

View PDFchevron_right

Discovering Cache Partitioning Optimizations for the K Computer

Swann Perarnau

View PDFchevron_right

Efficient Acceleration of Stencil Applications through In-Memory Computing

Ahmed Eltawil

Micromachines

View PDFchevron_right

Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs

Jairo Panetta

Anais do Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018

View PDFchevron_right

Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory System

Jason Bakos

ACM Transactions on Reconfigurable Technology and Systems, 2015

View PDFchevron_right

Auto-tuning Stencil Computations on Multicore and Accelerators

J. Shalf

View PDFchevron_right

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Rodrigo Rocha

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017

View PDFchevron_right

An auto-tuning framework for parallel multicore stencil computations

Leonid Oliker

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

View PDFchevron_right

Reducing redundancy in data organization and arithmetic calculation for stencil computations

Liang Yuan

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021

View PDFchevron_right

3.5-D blocking optimization for stencil computations on modern CPUs and GPUs

Pradeep Dubey

View PDFchevron_right

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Helmar Burkhart

2011 IEEE International Parallel & Distributed Processing Symposium, 2011

View PDFchevron_right

Modeling of L2 cache behavior for thread-parallel scientific programs on Chip Multi-Processors

Shirley Moore

2006

View PDFchevron_right

Analysis of non-uniform cache architecture policies for chip-multiprocessors using the parsec benchmark suite

Antonio Gonzalez

View PDFchevron_right

PERI-Auto-tuning Memory Intensive Kernels for Multicore

J. Shalf

View PDFchevron_right

Effective automatic parallelization of stencil computations

Uday Bondhugula

2007

View PDFchevron_right