Enabling predictable parallelism in single-GPU systems with persistent CUDA threads (original) (raw)

Profile image of Paolo Burgio Paolo Burgio

2023, arXiv (Cornell University)

CUDA and Applications to Task-based Programming

2021

View PDFchevron_right

A study of Persistent Threads style GPU programming for GPGPU workloads

2012 Innovative Parallel Computing (InPar), 2012

View PDFchevron_right

NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing

2009

View PDFchevron_right

Efficient parallel processing by improved CPU-GPU interaction

2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014

View PDFchevron_right

Fast heterogeneous computing with CUDA compatible Tesla GPU computing processor (personal supercomputing)

Mohammed Qadeer

2010

View PDFchevron_right

Multithreading for Compute Accelerators Through Distributed Shared Memory Design

Rafael Garibotti

2014

View PDFchevron_right

Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures

Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015

View PDFchevron_right

Memory Performance and Bottlenecks in Multicore and GPU Architectures

Luiz Guilherme Fernandes

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2019

View PDFchevron_right

MGSim + MGMark: A Framework for Multi-GPU System Research

arXiv (Cornell University), 2018

View PDFchevron_right

An Intermediate Library for Multi-GPUs Computing Skeletons

2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future, 2012

View PDFchevron_right

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

A complete and efficient CUDA-sharing solution for HPC clusters

Parallel Computing, 2014

View PDFchevron_right

Current Trends in Parallel Computing

International Journal of Computer Applications, 2012

View PDFchevron_right

Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads

حفصة خمقاني

IEEE Micro, 2021

View PDFchevron_right

Parallel Computing Experiences with CUDA

Joshua Anderson

IEEE Micro, 2000

View PDFchevron_right

Towards a methodology for creating time-critical, cloud-based CUDA applications

2018

View PDFchevron_right

Productivity of GPUs under different programming paradigms

Concurrency and Computation: Practice and Experience, 2012

View PDFchevron_right

Econometrics on GPUs

2012

View PDFchevron_right

XeroZerox: Analysis and Optimization of GPU Memory Management for High-Integrity Autonomous Systems

Alejandro Calderón

IEEE access, 2024

View PDFchevron_right

A combined GPGPU-FPGA high-performance desktop

View PDFchevron_right

Real-Time Computing on Multicore Processors

Rodolfo Pellizzoni

Computer, 2016

View PDFchevron_right

Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2021

View PDFchevron_right

Analyzing CUDA workloads using a detailed GPU simulator

IEEE ISPASS, 2009

View PDFchevron_right

IRJET-ACCELERATE EXECUTION OF CUDA PROGRAMS FOR NON GPU USERS USING GPU IN THE CLOUD

View PDFchevron_right

Early Experiences Migrating CUDA codes to oneAPI

Manuel Costanzo

ArXiv, 2021

View PDFchevron_right

Automating CUDA Synchronization via Program Transformation

2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)

View PDFchevron_right

Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance

Future Generation Computer Systems, 2019

View PDFchevron_right

The gpu used as a math co-processor in real time applications

Proceedings of the VI …, 2007

View PDFchevron_right

Parallel Computer Architectural Schemes

2012

View PDFchevron_right

SWOT Analysis of Parallel Processing APIs - CUDA, OpenCL, OpenMP and MPI and their Usage in Various Companies

SRINIVASA RAO KUNTE

International journal of applied engineering and management letters, 2023

View PDFchevron_right

DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

View PDFchevron_right

Effective multi-GPU communication using multiple CUDA streams and threads

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2014

View PDFchevron_right

Profiling general purpose GPU applications

Rafael Sachetto

Proceedings - Symposium on Computer Architecture and High Performance Computing, 2009

View PDFchevron_right

Scheduling Parallel Iterative Applications on Volatile Resources

2011 IEEE International Parallel & Distributed Processing Symposium, 2011

View PDFchevron_right

Source-to-Source Code Translator: OpenMP C to CUDA

2011 IEEE International Conference on High Performance Computing and Communications, 2011

View PDFchevron_right

Towards efficient GPU sharing on multicore processors

Tarek El-ghazawi

Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems - PMBS '11, 2011

View PDFchevron_right

Engineering Aerospace Engineering Computer Science Parallel Computing Computer Security Computation Graphics CUDA Exploit Programming language Graphics processing unit