Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors (original) (raw)
Related papers
Impact of modern memory subsystems on cache optimizations for stencil computations
Proceedings of the 2005 workshop on Memory system performance - MSP '05, 2005
Implicit and explicit optimizations for stencil computations
Proceedings of the 2006 workshop on Memory system performance and correctness - MSPC '06, 2006
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International …, 2011
Towards a MultiLevel Cache Performance Model for 3D Stencil Computation
Procedia Computer Science, 2011
Impact of System and Cache Bandwidth on Stencil Computations Across Multiple Processor Generations
mpi-inf.mpg.de
NUMA Aware Iterative Stencil Computations on Many-Core Systems
2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
2008
Understanding the performance of stencil computations on Intel's Xeon Phi
2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10, 2010
Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
Concurrency and Computation: Practice and Experience, 2018
Automatic code generation and tuning for stencil kernels on modern shared memory architectures
Computer Science - Research and Development, 2011
L2 Cache Modeling for Scientific Applications on Chip Multi-Processors
2007 International Conference on Parallel Processing (ICPP 2007), 2007
Locality aware concurrent start for stencil applications
Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel
2021
Discovering Cache Partitioning Optimizations for the K Computer
Efficient Acceleration of Stencil Applications through In-Memory Computing
Micromachines
Strategies to Improve the Performance and Energy Efficiency of Stencil Computations for NVIDIA GPUs
Anais do Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018
Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory System
ACM Transactions on Reconfigurable Technology and Systems, 2015
Auto-tuning Stencil Computations on Multicore and Accelerators
Automatic Partitioning of Stencil Computations on Heterogeneous Systems
2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017
An auto-tuning framework for parallel multicore stencil computations
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
Reducing redundancy in data organization and arithmetic calculation for stencil computations
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021
3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
2011 IEEE International Parallel & Distributed Processing Symposium, 2011
Modeling of L2 cache behavior for thread-parallel scientific programs on Chip Multi-Processors
2006
PERI-Auto-tuning Memory Intensive Kernels for Multicore
Effective automatic parallelization of stencil computations
2007