Enabling predictable parallelism in single-GPU systems with persistent CUDA threads (original) (raw)

Profile image of Paolo BurgioPaolo Burgio

2023, arXiv (Cornell University)

CUDA and Applications to Task-based Programming

Bernhard Kerbl

2021

View PDFchevron_right

A study of Persistent Threads style GPU programming for GPGPU workloads

Kshitij Gupta

2012 Innovative Parallel Computing (InPar), 2012

View PDFchevron_right

NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing

Gazmend Bojaj

2009

View PDFchevron_right

Efficient parallel processing by improved CPU-GPU interaction

Harsh Khatter

2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), 2014

View PDFchevron_right

Fast heterogeneous computing with CUDA compatible Tesla GPU computing processor (personal supercomputing)

Mohammed Qadeer

2010

View PDFchevron_right

Multithreading for Compute Accelerators Through Distributed Shared Memory Design

Rafael Garibotti

2014

View PDFchevron_right

Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures

Brad Peterson

Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015

View PDFchevron_right

Memory Performance and Bottlenecks in Multicore and GPU Architectures

Luiz Guilherme Fernandes

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2019

View PDFchevron_right

MGSim + MGMark: A Framework for Multi-GPU System Research

David Kaeli

arXiv (Cornell University), 2018

View PDFchevron_right

An Intermediate Library for Multi-GPUs Computing Skeletons

huu nguyen

2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future, 2012

View PDFchevron_right

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

A complete and efficient CUDA-sharing solution for HPC clusters

rafael MAYO

Parallel Computing, 2014

View PDFchevron_right

Current Trends in Parallel Computing

FIROJ ALI SK

International Journal of Computer Applications, 2012

View PDFchevron_right

Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads

حفصة خمقاني

IEEE Micro, 2021

View PDFchevron_right

Parallel Computing Experiences with CUDA

Joshua Anderson

IEEE Micro, 2000

View PDFchevron_right

Towards a methodology for creating time-critical, cloud-based CUDA applications

Matej Cigale

2018

View PDFchevron_right

Productivity of GPUs under different programming paradigms

Maria Malik

Concurrency and Computation: Practice and Experience, 2012

View PDFchevron_right

Econometrics on GPUs

Sonik Mandal

2012

View PDFchevron_right

XeroZerox: Analysis and Optimization of GPU Memory Management for High-Integrity Autonomous Systems

Alejandro Calderón

IEEE access, 2024

View PDFchevron_right

A combined GPGPU-FPGA high-performance desktop

An Braeken

View PDFchevron_right

Real-Time Computing on Multicore Processors

Rodolfo Pellizzoni

Computer, 2016

View PDFchevron_right

Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

Poornima Karri

2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2021

View PDFchevron_right

Analyzing CUDA workloads using a detailed GPU simulator

Wilson Fung

IEEE ISPASS, 2009

View PDFchevron_right

IRJET-ACCELERATE EXECUTION OF CUDA PROGRAMS FOR NON GPU USERS USING GPU IN THE CLOUD

IRJET Journal

View PDFchevron_right

Early Experiences Migrating CUDA codes to oneAPI

Manuel Costanzo

ArXiv, 2021

View PDFchevron_right

Automating CUDA Synchronization via Program Transformation

Shin Hwei Tan

2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)

View PDFchevron_right

Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance

Matej Cigale

Future Generation Computer Systems, 2019

View PDFchevron_right

The gpu used as a math co-processor in real time applications

Paulo Pagliosa

Proceedings of the VI …, 2007

View PDFchevron_right

Parallel Computer Architectural Schemes

P M Chawan

2012

View PDFchevron_right

SWOT Analysis of Parallel Processing APIs - CUDA, OpenCL, OpenMP and MPI and their Usage in Various Companies

SRINIVASA RAO KUNTE

International journal of applied engineering and management letters, 2023

View PDFchevron_right

DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

Hyojin Sung

2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

View PDFchevron_right

Effective multi-GPU communication using multiple CUDA streams and threads

Xing Cai

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2014

View PDFchevron_right

Profiling general purpose GPU applications

Rafael Sachetto

Proceedings - Symposium on Computer Architecture and High Performance Computing, 2009

View PDFchevron_right

Scheduling Parallel Iterative Applications on Volatile Resources

henry casanova

2011 IEEE International Parallel & Distributed Processing Symposium, 2011

View PDFchevron_right

Source-to-Source Code Translator: OpenMP C to CUDA

C. Jaillet

2011 IEEE International Conference on High Performance Computing and Communications, 2011

View PDFchevron_right

Towards efficient GPU sharing on multicore processors

Tarek El-ghazawi

Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems - PMBS '11, 2011

View PDFchevron_right

EngineeringAerospace EngineeringComputer ScienceParallel ComputingComputer SecurityComputationGraphicsCUDAExploitProgramming languageGraphics processing unit