What is cuDNN? | GPU Glossary (original) (raw)

NVIDIA's cuDNN (CUDA Deep Neural Network) is a library of primitives for building GPU-accelerated deep neural networks.

cuDNN provides highly optimized kernels for operations arising frequently in neural networks. These include convolution, self-attention (including scaled dot-product attention, aka "Flash Attention"), matrix multiplication, various normalizations, poolings, etc.

cuDNN is a key library at the application layer of theCUDA software platform , alongside its sibling library, cuBLAS . Deep learning frameworks like PyTorch typically leveragecuBLAS for general-purpose linear algebra, such as the matrix multiplications that form the core of dense (fully-connected) layers. They rely on cuDNN for more specialized primitives like convolutional layers, normalization routines, and attention mechanisms.

In modern cuDNN code, computations are expressed as operation graphs, which can be constructed using open sourcePython and C++ frontend APIs via the declarativeGraph API (not to be confused with CUDA Graphs ).

This API allows the developer to define a sequence of operations as a graph, which cuDNN can then analyze to perform optimizations, most importantly operation fusion. In operation fusion, a sequence of operations like Convolution + Bias + ReLU are merged ("fused") into a single operation run as a single kernel . Operation fusion helps reduce demand on memory bandwidth by keeping program intermediates inshared memory throughout a sequence of operations.

The frontends interact with a lower-level, closed sourceC backend , which exposes an API for legacy use cases or direct C FFI.

For any given operation, cuDNN maintains multiple underlying implementations and uses (unknown) internal heuristics to select the most performant one for the targetStreaming Multiprocessor (SM) architecture , data types, and input sizes.

cuDNN's initial claim to fame was accelerating convolutional neural networks on AmpereSM architecture GPUs. For Transformer neural networks on Hopper and especially BlackwellSM architectures , NVIDIA has tended to place more emphasis on theCUTLASS library.

For more information on cuDNN, see theofficial cuDNN documentation , and the open source frontend APIs .