What is a Cooperative Thread Array? | GPU Glossary (original) (raw)

A cooperative thread array (CTA) is a collection of threads scheduled onto the sameStreaming Multiprocessor (SM) . CTAs are thePTX /SASS implementation of theCUDA programming model 'sthread blocks . CTAs are composed of one or more warps .

Programmers can direct threads within a CTA to coordinate with each other. The programmer-managedshared memory , in theL1 data cache of theSMs , makes this coordination fast. Threads in different CTAs cannot coordinate with each other via barriers, unlike threads within a CTA, and instead must coordinate viaglobal memory , e.g. via atomic update instructions. Due to driver control over the scheduling of CTAs at runtime, CTA execution order is indeterminate and blocking a CTA on another CTA can easily lead to deadlock.

The number of CTAs that can be scheduled onto a singleSM sets theachievable occupancy and depends on a number of factors. Fundamentally, theSM has a limited set of resources — lines in theregister file , "slots" forwarps , bytes ofshared memory in theL1 data cache — and each CTA uses a certain amount of those resources (as calculated atcompile time) when scheduled onto anSM .