cuSPARSELt Functions — NVIDIA cuSPARSELt (original) (raw)

Library Management Functions#

cusparseLtInit#

cusparseStatus_t cusparseLtInit(cusparseLtHandle_t* handle)

The function initializes the cuSPARSELt library handle (cusparseLtHandle_t) which holds the cuSPARSELt library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSPARSELt library calls. Calling any cusparseLt function which uses cusparseLtHandle_t without a previous call of cusparseLtInit() will return an error.

The cuSPARSELt library context is tied to the current CUDA device. To use the library on multiple devices, one cuSPARSELt handle should be created for each device.

See cusparseStatus_t for the description of the return status.


cusparseLtDestroy#

cusparseStatus_t cusparseLtDestroy(const cusparseLtHandle_t* handle)

The function releases hardware resources used by the cuSPARSELt library. This function is the last call with a particular handle to the cuSPARSELt library.

Calling any cusparseLt function which uses cusparseLtHandle_t after cusparseLtDestroy() will return an error.

See cusparseStatus_t for the description of the return status.


cusparseLtGetVersion#

cusparseStatus_t cusparseLtGetVersion(const cusparseLtHandle_t* handle, int* version)

This function returns the version number of the cuSPARSELt library.

See cusparseStatus_t for the description of the return status.


cusparseLtGetProperty#

cusparseStatus_t cusparseLtGetProperty(libraryPropertyType propertyType, int* value)

The function returns the value of the requested property. Refer to libraryPropertyType for supported types.

libraryPropertyType (defined in library_types.h):

See cusparseStatus_t for the description of the return status.


Matrix Descriptor Functions#

cusparseLtDenseDescriptorInit#

cusparseStatus_t cusparseLtDenseDescriptorInit(const cusparseLtHandle_t* handle, cusparseLtMatDescriptor_t* matDescr, int64_t rows, int64_t cols, int64_t ld, uint32_t alignment, cudaDataType valueType, cusparseOrder_t order)

The function initializes the descriptor of a dense matrix.

Constrains:

See cusparseStatus_t for the description of the return status.


cusparseLtStructuredDescriptorInit#

cusparseStatus_t cusparseLtStructuredDescriptorInit(const cusparseLtHandle_t* handle, cusparseLtMatDescriptor_t* matDescr, int64_t rows, int64_t cols, int64_t ld, uint32_t alignment, cudaDataType valueType, cusparseOrder_t order, cusparseLtSparsity_t sparsity)

The function initializes the descriptor of a structured matrix.

Constrains:

See cusparseStatus_t for the description of the return status.


cusparseLtMatDescriptorDestroy#

cusparseStatus_t cusparseLtMatDescriptorDestroy(const cusparseLtMatDescriptor_t* matDescr)

The function releases the resources used by an instance of a matrix descriptor. After this call, the matrix descriptor, the matmul descriptor, and the plan can no longer be used.

See cusparseStatus_t for the description of the return status.


cusparseLtMatDescSetAttribute#

cusparseStatus_t cusparseLtMatDescSetAttribute(const cusparseLtHandle_t* handle, cusparseLtMatDescriptor_t* matmulDescr, cusparseLtMatDescAttribute_t matAttribute, const void* data, size_t dataSize)

The function sets the value of the specified attribute belonging to matrix descriptor such as number of batches and their stride.

See cusparseStatus_t for the description of the return status.


cusparseLtMatDescGetAttribute#

cusparseStatus_t cusparseLtMatDescGetAttribute(const cusparseLtHandle_t* handle, const cusparseLtMatDescriptor_t* matmulDescr, cusparseLtMatDescAttribute_t matAttribute, void* data, size_t dataSize)

The function gets the value of the specified attribute belonging to matrix descriptor such as number of batches and their stride.

See cusparseStatus_t for the description of the return status.


Matmul Descriptor Functions#

cusparseLtMatmulDescriptorInit#

cusparseStatus_t cusparseLtMatmulDescriptorInit(const cusparseLtHandle_t* handle, cusparseLtMatmulDescriptor_t* matmulDescr, cusparseOperation_t opA, cusparseOperation_t opB, const cusparseLtMatDescriptor_t* matA, const cusparseLtMatDescriptor_t* matB, const cusparseLtMatDescriptor_t* matC, const cusparseLtMatDescriptor_t* matD, cusparseComputeType computeType)

The function initializes the matrix multiplication descriptor.

The structured matrix descriptor can used for matA or matB but not both.

Constrains:

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulDescSetAttribute#

cusparseStatus_t cusparseLtMatmulDescSetAttribute(const cusparseLtHandle_t* handle, cusparseLtMatmulDescriptor_t* matmulDescr, cusparseLtMatmulDescAttribute_t matmulAttribute, const void* data, size_t dataSize)

The function sets the value of the specified attribute belonging to matrix descriptor such as activation function and bias.

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulDescGetAttribute#

cusparseStatus_t cusparseLtMatmulDescGetAttribute(const cusparseLtHandle_t* handle, const cusparseLtMatmulDescriptor_t* matmulDescr, cusparseLtMatmulDescAttribute_t matmulAttribute, void* data, size_t dataSize)

The function gets the value of the specified attribute belonging to matrix descriptor such as activation function and bias.

See cusparseStatus_t for the description of the return status.


Matmul Algorithm Functions#

cusparseLtMatmulAlgSelectionInit#

cusparseStatus_t cusparseLtMatmulAlgSelectionInit(const cusparseLtHandle_t* handle, cusparseLtMatmulAlgSelection_t* algSelection, const cusparseLtMatmulDescriptor_t* matmulDescr, cusparseLtMatmulAlg_t alg)

The function initializes the algorithm selection descriptor.

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulAlgSetAttribute#

cusparseStatus_t cusparseLtMatmulAlgSetAttribute(const cusparseLtHandle_t* handle, cusparseLtMatmulAlgSelection_t* algSelection, cusparseLtMatmulAlgAttribute_t attribute, const void* data, size_t dataSize)

The function sets the value of the specified attribute belonging to algorithm selection descriptor.

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulAlgGetAttribute#

cusparseStatus_t cusparseLtMatmulAlgGetAttribute(const cusparseLtHandle_t* handle, const cusparseLtMatmulAlgSelection_t* algSelection, cusparseLtMatmulAlgAttribute_t attribute, void* data, size_t dataSize)

The function returns the value of the queried attribute belonging to algorithm selection descriptor.

See cusparseStatus_t for the description of the return status.


Matmul Functions#

cusparseLtMatmulGetWorkspace#

cusparseStatus_t cusparseLtMatmulGetWorkspace(const cusparseLtHandle_t* handle, const cusparseLtMatmulPlan_t* plan, size_t* workspaceSize)

The function determines the required workspace size associated to the selected algorithm.

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulPlanInit#

cusparseStatus_t cusparseLtMatmulPlanInit(const cusparseLtHandle_t* handle, cusparseLtMatmulPlan_t* plan, const cusparseLtMatmulDescriptor_t* matmulDescr, const cusparseLtMatmulAlgSelection_t* algSelection)

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulPlanDestroy#

cusparseStatus_t cusparseLtMatmulPlanDestroy(const cusparseLtMatmulPlan_t* plan)

The function releases the resources used by an instance of the matrix multiplication plan. This function is the last call with a specific plan instance.

Calling any cusparseLt function which uses cusparseLtMatmulPlan_t after cusparseLtMatmulPlanDestroy() will return an error.

See cusparseStatus_t for the description of the return status.


cusparseLtMatmul#

cusparseStatus_t cusparseLtMatmul(const cusparseLtHandle_t* handle, const cusparseLtMatmulPlan_t* plan, const void* alpha, const void* d_A, const void* d_B, const void* beta, const void* d_C, void* d_D, void* workspace, cudaStream_t* streams, int32_t numStreams)

The function computes the matrix multiplication of matrices A and B to produce the the output matrix D, according to the following operation:

D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias)

where A, B, and C are input matrices, and \alpha and \beta are input scalars or vectors of scalars (device-side pointers).

As described by cusparseLtMatmulDescriptorInit(), one and only one input matrix A or B should have structured sparsity, and respective d_A or d_B structured sparse matrix pointer should be output of cusparseLtSpMMACompress() or cusparseLtSpMMACompress2().

Note: The function currently only supports the case where D has the same shape of C

Data types Supported:

For detailed list of which GPU Compute Capabilities support which datatype combinations, see Key Features

Constrains:

Properties

cusparseLtMatmul supports the following optimizations:

See cusparseStatus_t for the description of the return status.


cusparseLtMatmulSearch#

cusparseStatus_t cusparseLtMatmulSearch(const cusparseLtHandle_t* handle, cusparseLtMatmulPlan_t* plan, const void* alpha, const void* d_A, const void* d_B, const void* beta, const void* d_C, void* d_D, void* workspace, cudaStream_t* streams, int32_t numStreams)

The function evaluates all available algorithms for the matrix multiplication described by plan and automatically updates the cusparseLtMatmulAlgSelection_t used to initialize the plan by selecting the fastest one. The functionality is intended to be used for auto-tuning purposes when the same operation is repeated multiple times over different inputs.

The function behaves similarly to cusparseLtMatmul(), with the difference that d_D values may accumulate if the operation is performed in-place (d_C=d_D).


Helper Functions#

cusparseLtSpMMAPrune#

cusparseStatus_t cusparseLtSpMMAPrune(const cusparseLtHandle_t* handle, const cusparseLtMatmulDescriptor_t* matmulDescr, const void* d_in, void* d_out, cusparseLtPruneAlg_t pruneAlg, cudaStream_t stream)

The function prunes a dense matrix d_in according to the specified algorithm pruneAlg.

Properties

cusparseLtSpMMAPrune() supports the following optimizations:

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMAPrune2 [DEPRECATED]#

cusparseStatus_t cusparseLtSpMMAPrune2(const cusparseLtHandle_t* handle, const cusparseLtMatDescriptor_t* sparseMatDescr, int isSparseA, cusparseOperation_t op, const void* d_in, void* d_out, cusparseLtPruneAlg_t pruneAlg, cudaStream_t stream);

The function prunes a dense matrix d_in according to the specified algorithm pruneAlg.

If CUSPARSELT_PRUNE_SPMMA_TILE is used, isSparseA and op are not relevant.

The function has the same properties of cusparseLtSpMMAPrune()


cusparseLtSpMMAPruneCheck#

cusparseStatus_t cusparseLtSpMMAPruneCheck(const cusparseLtHandle_t* handle, const cusparseLtMatmulDescriptor_t* matmulDescr, const void* d_in, int* d_valid, cudaStream_t stream)

The function checks the correctness of the pruning structure for a given matrix. Data pruned with cusparseLtSpMMAPrune() is guaranteed to be correct and this function can be skipped.

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMAPruneCheck2 [DEPRECATED]#

cusparseStatus_t cusparseLtSpMMAPruneCheck2(const cusparseLtHandle_t* handle, const cusparseLtMatDescriptor_t* sparseMatDescr, int isSparseA, cusparseOperation_t op, const void* d_in, int* d_valid, cudaStream_t stream)

The function checks the correctness of the pruning structure for a given matrix.

The function has the same properties of cusparseLtSpMMAPruneCheck()


cusparseLtSpMMACompressedSize#

cusparseStatus_t cusparseLtSpMMACompressedSize(const cusparseLtHandle_t* handle, const cusparseLtMatmulPlan_t* plan, size_t* compressedSize, size_t* compressBufferSize)

The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress().

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMACompressedSize2 [DEPRECATED]#

cusparseStatus_t cusparseLtSpMMACompressedSize2(const cusparseLtHandle_t* handle, const cusparseLtMatDescriptor_t* sparseMatDescr, size_t* compressedSize, size_t* compressBufferSize)

The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress2(). It has to be called after cusparseLtMatmulPlanInit.

The function has the same properties of cusparseLtSpMMACompressedSize()


cusparseLtSpMMACompress#

cusparseStatus_t cusparseLtSpMMACompress(const cusparseLtHandle_t* handle, const cusparseLtMatmulPlan_t* plan, const void* d_dense, void* d_compressed, void* d_compressed_buffer, cudaStream_t stream)

The function compresses a dense matrix d_dense. The compressed matrix is intended to be used as the first/second operand A/B in the cusparseLtMatmul() or cusparseLtMatmulSearch() function.

Input matrix d_dense to this function must be pruned either with cusparseLtSpMMAPrune() <cusparseLtSpMMAPrune-label> or with custom function. Pruned data should respect the following constrains depending on the operation applied to this matrix in the cusparseLtMatmul() which is defined by cusparseLtMatmulDescriptor_t created in the cusparseLtMatmulDescriptorInit():

int8, e4m3 and e5m2 kernels should run at high SM clocks for maximizing the performance.

The correctness of the pruning result (matrix A/B) can be checked with the function cusparseLtSpMMAPruneCheck(). Note that pruning with cusparseLtSpMMAPrune() <cusparseLtSpMMAPrune-label> is guaranteed to be correct.

Properties

cusparseLtSpMMACompress() has to be called each time after the algorithm ID is updated with cusparseLtMatmulAlgGetAttribute().

See cusparseStatus_t for the description of the return status.


cusparseLtSpMMACompress2 [DEPRECATED]#

cusparseStatus_t cusparseLtSpMMACompress2(const cusparseLtHandle_t* handle, const cusparseLtMatDescriptor_t* sparseMatDescr, int isSparseA, cusparseOperation_t op, const void* d_dense, void* d_compressed, void* d_compressed_buffer, cudaStream_t stream)

The function has the same properties of cusparseLtSpMMACompress()