Quick Start Guide — NVIDIA CUTLASS Documentation (original) (raw)

The CUTLASS DSL 4.4 release currently supports Linux and Python 3.10 - 3.14 only. To install CUTLASS DSLs (limited to CuTe DSL for now), use the following command

Installation#

Before installing the latest version, you need to uninstall any previous CUTLASS DSL Installation.

pip uninstall nvidia-cutlass-dsl nvidia-cutlass-dsl-libs-base nvidia-cutlass-dsl-libs-cu13 -y

To ensure compatibility with the examples and code on GitHub, use the setup.sh file from the corresponding commit in the repository.

git clone https://github.com/NVIDIA/cutlass.git

For CUDA Toolkit 12.9:

./cutlass/python/CuTeDSL/setup.sh --cu12

For CUDA Toolkit 13.1:

./cutlass/python/CuTeDSL/setup.sh --cu13

If you just want to try out the last known stable release of the CUTLASS DSL (may not be compatible with the latest examples and code), run:

For CUDA Toolkit 12.9:

pip install nvidia-cutlass-dsl

For CUDA Toolkit 13.1:

pip install nvidia-cutlass-dsl[cu13]

The nvidia-cutlass-dsl wheel includes everything needed to generate GPU kernels. It requires the same NVIDIA driver version as the corresponding CUDA Toolkit(CUDA Toolkit 12.9 or CUDA Toolkit 13.1).

To run examples and begin development, we recommend installing:

pip install torch jupyter

We recommend installing JAX with CUDA support at version 0.8.1 to run JAX examples.

We recommend setting the following environment variable when running jupyter notebooks.

export PYTHONUNBUFFERED=1