Release CUTLASS 2.3 · NVIDIA/cutlass (original) (raw)
CUTLASS 2.3
- NVIDIA Ampere Architecture features
- Sparse Tensor Core GEMM kernels:
* Direct access to Sparse Tensor Cores and maximum performance via mma.sp.sync - Fast SGEMM targeting GeForce RTX 30-series CUDA Cores
- Sparse Tensor Core GEMM kernels:
- Minor Features:
- Activation functions such as GeLU and Sigmoid
- Small matrix and quaternion template classes in device code
- Floating-point constants
- NVIDIA Ampere GPU Architecture examples and documentation:
- Tensor Float 32 and
- Sparse Tensor Cores
- Documentation added on CUTLASS efficient row-major epilogue