A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication — NVIDIA cuSPARSELt (original) (raw)
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where refers to in-place operations such as transpose/non-transpose, and
are scalars or vectors.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Provide Feedback: Math-Libs-Feedback@nvidia.com
Examples:cuSPARSELt Example 1,cuSPARSELt Example 2
Blog post:
- Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt
- Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines
- Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture
Key Features#
- NVIDIA Sparse MMA tensor core support
- Mixed-precision computation support:
- Matrix pruning and compression functionalities
- Activation functions, bias vector, and output scaling
- Batched computation (multiple matrices in a single run)
- GEMM Split-K mode
- Auto-tuning functionality (see cusparseLtMatmulSearch())
- NVTX ranging and Logging functionalities
Support#
- Supported SM Architectures:
SM 8.0
,SM 8.6
,SM 8.7
,SM 8.9
,SM 9.0
,SM 10.0
,SM 12.0
- Supported CPU architectures and operating systems: