Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures (original) (raw)
Related papers
Importance of explicit vectorization for CPU and GPU software performance
Journal of Computational Physics, 2011
2010
Profiling general purpose GPU applications
Proceedings - Symposium on Computer Architecture and High Performance Computing, 2009
A combined GPGPU-FPGA high-performance desktop
GPU Programming in Rust: Implementing High-Level Abstractions in a Systems-Level Language
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, 2013
Mont-Blanc 2020: Towards Scalable and Power Efficient European HPC Processors
2021
The gpu used as a math co-processor in real time applications
Proceedings of the VI …, 2007
Parallelism Support in SIMD/VLIW Image Processing Architectures
2005
Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads
IEEE Micro, 2021
Fast Operations on Raster Images with SIMD Machine Architectures
Computer Graphics Forum, 1986
Current Trends in Parallel Computing
International Journal of Computer Applications, 2012
Toward Supporting Multi-GPU Targets via Taskloop and User-Defined Schedules
OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020
GEMTC: GPU Enabled Many-Task Computing
Improving the GPU space of computation under triangular domain problems
Design and evaluation of the gemtc framework for GPU-enabled many-task computing
Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14, 2014
Real-time data-intensive computing
AIP Conference Proceedings, 2016
2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2021
Smith-Waterman Acceleration in Multi-GPUs: A Performance per Watt Analysis
Lecture Notes in Computer Science, 2017
Towards efficient GPU sharing on multicore processors
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems - PMBS '11, 2011
Specification and verification of GPGPU programs
Science of Computer Programming, 2014
CNN-based language and interpreter for image processing on GPUs
International Journal of Parallel, Emergent and Distributed Systems, 2011
GPU Tensor Cores for Fast Arithmetic Reductions
IEEE Transactions on Parallel and Distributed Systems, 2021
Large-scale compute-intensive analysis via a combined in-situ and co-scheduling workflow approach
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015
Blaze-DEMGPU: Modular high performance DEM framework for the GPU architecture
SoftwareX, 2016
Journal of Electronic Imaging, 2012