cuTENSOR Functions — cuTENSOR (original) (raw)

const cutensorTensorDescriptor_t descA,

const int32_t modeA[],

const cutensorTensorDescriptor_t descB,

const int32_t modeB[],

cutensorOperator_t opB,

const cutensorTensorDescriptor_t descC,

const int32_t modeC[],

const cutensorTensorDescriptor_t descD,

const int32_t modeD[],

cutensorOperator_t opAB,

cutensorOperator_t opABC,

const cutensorComputeDescriptor_t descCompute,

This function creates an operation descriptor that encodes an elementwise trinary operation.

Said trinary operation has the following general form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{ABC}(\Phi_{AB}(\alpha op_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \beta op_B(B_{\Pi^B(i_0,i_1,...,i_n)})), \gamma op_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

Where

A,B,C,D are multi-mode tensors (of arbitrary data types).
\(\Pi^A, \Pi^B, \Pi^C \) are permutation operators that permute the modes of A, B, and C respectively.
\(op_{A},op_{B},op_{C}\) are unary element-wise operators (e.g., IDENTITY, CONJUGATE).
\(\Phi_{ABC}, \Phi_{AB}\) are binary element-wise operators (e.g., ADD, MUL, MAX, MIN).

Notice that the broadcasting (of a mode) can be achieved by simply omitting that mode from the respective tensor.

Moreover, modes may appear in any order, giving users a greater flexibility. The only restrictions are:

modes that appear in A or B must also appear in the output tensor; a mode that only appears in the input would be contracted and such an operation would be covered by either cutensorContract or cutensorReduce.
each mode may appear in each tensor at most once.

Input tensors may be read even if the value of the corresponding scalar is zero.

Examples:

\( D_{a,b,c,d} = A_{b,d,a,c}\)
\( D_{a,b,c,d} = 2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a}\)
\( D_{a,b,c,d} = 2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a} + C_{a,b,c,d}\)
\( D_{a,b,c,d} = min((2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a}), C_{a,b,c,d})\)

Call cutensorElementwiseTrinaryExecute to perform the actual operation.

Please use cutensorDestroyOperationDescriptor to deallocated the descriptor once it is no longer used.

Supported data-type combinations are:

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
desc – [out] This opaque struct gets allocated and filled with the information that encodes the requested elementwise operation.
descA – [in] A descriptor that holds the information about the data type, modes, and strides of A.
modeA – [in] Array (in host memory) of size descA->numModes that holds the names of the modes of A (e.g., if \(A_{a,b,c}\) then modeA = {‘a’,’b’,’c’}). The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorCreateTensorDescriptor.
opA – [in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.
descB – [in] A descriptor that holds information about the data type, modes, and strides of B.
modeB – [in] Array (in host memory) of size descB->numModes that holds the names of the modes of B. modeB[i] corresponds to extent[i] and stride[i] of the cutensorCreateTensorDescriptor
opB – [in] Unary operator that will be applied to each element of B before it is further processed. The original data of this tensor remains unchanged.
descC – [in] A descriptor that holds information about the data type, modes, and strides of C.
modeC – [in] Array (in host memory) of size descC->numModes that holds the names of the modes of C. The modeC[i] corresponds to extent[i] and stride[i] of the cutensorCreateTensorDescriptor.
opC – [in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.
descD – [in] A descriptor that holds information about the data type, modes, and strides of D. Notice that we currently request descD and descC to be identical.
modeD – [in] Array (in host memory) of size descD->numModes that holds the names of the modes of D. The modeD[i] corresponds to extent[i] and stride[i] of the cutensorCreateTensorDescriptor.
opAB – [in] Element-wise binary operator (see \(\Phi_{AB}\) above).
opABC – [in] Element-wise binary operator (see \(\Phi_{ABC}\) above).
descCompute – [in] Determines the precision in which this operations is performed.

Return values:

CUTENSOR_STATUS_SUCCESS – The operation completed successfully.
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.
CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).
CUTENSOR_STATUS_ARCH_MISMATCH – if the device is either not ready, or the target architecture is not supported.

`cutensorElementwiseTrinaryExecute()`#

cutensorStatus_t cutensorElementwiseTrinaryExecute(

const cutensorHandle_t handle,

const cutensorPlan_t plan,

const void *alpha,

const void *A,

const void *beta,

const void *B,

const void *gamma,

const void *C,

void *D,

cudaStream_t stream,

Performs an element-wise tensor operation for three input tensors (see cutensorCreateElementwiseTrinary)

This function performs a element-wise tensor operation of the form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{ABC}(\Phi_{AB}(\alpha op_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \beta op_B(B_{\Pi^B(i_0,i_1,...,i_n)})), \gamma op_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

See cutensorCreateElementwiseTrinary() for details.

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
plan – [in] Opaque handle holding all information about the desired elementwise operation (created by cutensorCreateElementwiseTrinary followed by cutensorCreatePlan).
alpha – [in] Scaling factor for A (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). Pointer to the host memory. If alpha is zero, A is not read and the corresponding unary operator is not applied.
A – [in] Multi-mode tensor (described by descA as part of cutensorCreateElementwiseTrinary). Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
beta – [in] Scaling factor for B (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). Pointer to the host memory. If beta is zero, B is not read and the corresponding unary operator is not applied.
B – [in] Multi-mode tensor (described by descB as part of cutensorCreateElementwiseTrinary). Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
gamma – [in] Scaling factor for C (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). Pointer to the host memory. If gamma is zero, C is not read and the corresponding unary operator is not applied.
C – [in] Multi-mode tensor (described by descC as part of cutensorCreateElementwiseTrinary). Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
D – [out] Multi-mode tensor (described by descD as part of cutensorCreateElementwiseTrinary). Pointer to the GPU-accessible memory (C and D may be identical, if and only if descC == descD).
stream – [in] The CUDA stream used to perform the operation.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported
CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value
CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

`cutensorCreateElementwiseBinary()`#

cutensorStatus_t cutensorCreateElementwiseBinary(

const cutensorHandle_t handle,

const cutensorTensorDescriptor_t descA,

const int32_t modeA[],

const cutensorTensorDescriptor_t descC,

const int32_t modeC[],

const cutensorTensorDescriptor_t descD,

const int32_t modeD[],

cutensorOperator_t opAC,

const cutensorComputeDescriptor_t descCompute,

This function creates an operation descriptor for an elementwise binary operation.

The binary operation has the following general form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{AC}(\alpha \Psi_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \gamma \Psi_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

Call cutensorElementwiseBinaryExecute to perform the actual operation.

Supported data-type combinations are:

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
desc – [out] This opaque struct gets allocated and filled with the information that encodes the requested elementwise operation.
descA – [in] The descriptor that holds the information about the data type, modes, and strides of A.
modeA – [in] Array (in host memory) of size descA->numModes that holds the names of the modes of A (e.g., if A_{a,b,c} => modeA = {‘a’,’b’,’c’}). The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorCreateTensorDescriptor.
opA – [in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.
descC – [in] The descriptor that holds information about the data type, modes, and strides of C.
modeC – [in] Array (in host memory) of size descC->numModes that holds the names of the modes of C. The modeC[i] corresponds to extent[i] and stride[i] of the cutensorCreateTensorDescriptor.
opC – [in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.
descD – [in] The descriptor that holds information about the data type, modes, and strides of D. Notice that we currently request descD and descC to be identical.
modeD – [in] Array (in host memory) of size descD->numModes that holds the names of the modes of D. The modeD[i] corresponds to extent[i] and stride[i] of the cutensorCreateTensorDescriptor.
opAC – [in] Element-wise binary operator (see \(\Phi_{AC}\) above).
descCompute – [in] Determines the precision in which this operations is performed.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported
CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value
CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

`cutensorElementwiseBinaryExecute()`#

cutensorStatus_t cutensorElementwiseBinaryExecute(

const cutensorHandle_t handle,

const cutensorPlan_t plan,

const void *alpha,

const void *A,

const void *gamma,

const void *C,

void *D,

cudaStream_t stream,

Performs an element-wise tensor operation for two input tensors (see cutensorCreateElementwiseBinary)

This function performs a element-wise tensor operation of the form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{AC}(\alpha \Psi_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \gamma \Psi_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

See cutensorCreateElementwiseBinary() for details.

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
plan – [in] Opaque handle holding all information about the desired elementwise operation (created by cutensorCreateElementwiseBinary followed by cutensorCreatePlan).
alpha – [in] Scaling factor for A (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). Pointer to the host memory. If alpha is zero, A is not read and the corresponding unary operator is not applied.
A – [in] Multi-mode tensor (described by descA as part of cutensorCreateElementwiseBinary). Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
gamma – [in] Scaling factor for C (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). Pointer to the host memory. If gamma is zero, C is not read and the corresponding unary operator is not applied.
C – [in] Multi-mode tensor (described by descC as part of cutensorCreateElementwiseBinary). Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
D – [out] Multi-mode tensor (described by descD as part of cutensorCreateElementwiseBinary). Pointer to the GPU-accessible memory (C and D may be identical, if and only if descC == descD).
stream – [in] The CUDA stream used to perform the operation.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported
CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value
CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

`cutensorCreatePermutation()`#

cutensorStatus_t cutensorCreatePermutation(

const cutensorHandle_t handle,

const cutensorTensorDescriptor_t descA,

const int32_t modeA[],

const cutensorTensorDescriptor_t descB,

const int32_t modeB[],

const cutensorComputeDescriptor_t descCompute,

This function creates an operation descriptor for a tensor permutation.

The tensor permutation has the following general form:

\[ B_{\Pi^B(i_0,i_1,...,i_n)} = \alpha op_A(A_{\Pi^A(i_0,i_1,...,i_n)}) \]

Consequently, this function performs an out-of-place tensor permutation and is a specialization of cutensorCreateElementwiseBinary.

Where

A and B are multi-mode tensors (of arbitrary data types),
\(\Pi^A, \Pi^B\) are permutation operators that permute the modes of A, B respectively,
\(op_A\) is an unary element-wise operators (e.g., IDENTITY, SQR, CONJUGATE), and
\(\Psi\) is specified in the tensor descriptor descA.

Broadcasting (of a mode) can be achieved by simply omitting that mode from the respective tensor.

Modes may appear in any order. The only restrictions are:

modes that appear in A must also appear in the output tensor.
each mode may appear in each tensor at most once.

Supported data-type combinations are:

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
desc – [out] This opaque struct gets allocated and filled with the information that encodes the requested permutation.
descA – [in] The descriptor that holds information about the data type, modes, and strides of A.
modeA – [in] Array of size descA->numModes that holds the names of the modes of A (e.g., if A_{a,b,c} => modeA = {‘a’,’b’,’c’})
opA – [in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.
descB – [in] The descriptor that holds information about the data type, modes, and strides of B.
modeB – [in] Array of size descB->numModes that holds the names of the modes of B
descCompute – [in] Determines the precision in which this operations is performed.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported
CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value
CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

`cutensorPermute()`#

cutensorStatus_t cutensorPermute(

const cutensorHandle_t handle,

const cutensorPlan_t plan,

const void *alpha,

const void *A,

void *B,

const cudaStream_t stream,

Performs the tensor permutation that is encoded by plan (see cutensorCreatePermutation).

This function performs an elementwise tensor operation of the form:

\[ B_{\Pi^B(i_0,i_1,...,i_n)} = \alpha \Psi(A_{\Pi^A(i_0,i_1,...,i_n)}) \]

Consequently, this function performs an out-of-place tensor permutation.

Where

A and B are multi-mode tensors (of arbitrary data types),
\(\Pi^A, \Pi^B\) are permutation operators that permute the modes of A, B respectively,
\(\Psi\) is an unary element-wise operators (e.g., IDENTITY, SQR, CONJUGATE), and
\(\Psi\) is specified in the tensor descriptor descA.

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
plan – [in] Opaque handle holding all information about the desired tensor reduction (created by cutensorCreatePermutation followed by cutensorCreatePlan).
alpha – [in] Scaling factor for A (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE)). Pointer to the host memory. If alpha is zero, A is not read and the corresponding unary operator is not applied.
A – [in] Multi-mode tensor of type typeA with nmodeA modes. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
B – [inout] Multi-mode tensor of type typeB with nmodeB modes. Pointer to the GPU-accessible memory.
stream – [in] The CUDA stream.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported
CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value
CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

Contraction Operations#

The following functions perform contractions between tensors.

`cutensorCreateContraction()`#

cutensorStatus_t cutensorCreateContraction(

const cutensorHandle_t handle,

const cutensorTensorDescriptor_t descA,

const int32_t modeA[],

const cutensorTensorDescriptor_t descB,

const int32_t modeB[],

cutensorOperator_t opB,

const cutensorTensorDescriptor_t descC,

const int32_t modeC[],

const cutensorTensorDescriptor_t descD,

const int32_t modeD[],

const cutensorComputeDescriptor_t descCompute,

This function allocates a cutensorOperationDescriptor_t object that encodes a tensor contraction of the form \( D = \alpha \mathcal{A} \mathcal{B} + \beta \mathcal{C} \).

Allocates data for desc to be used to perform a tensor contraction of the form

\[ \mathcal{D}_{{modes}_\mathcal{D}} \gets \alpha op_\mathcal{A}(\mathcal{A}_{{modes}_\mathcal{A}}) op_\mathcal{B}(B_{{modes}_\mathcal{B}}) + \beta op_\mathcal{C}(\mathcal{C}_{{modes}_\mathcal{C}}). \]

See cutensorCreatePlan (or cutensorCreatePlanAutotuned) to create the plan (i.e., to select the kernel) followed by a call to cutensorContract to perform the actual contraction.

The user is responsible for calling cutensorDestroyOperationDescriptor to free the resources associated with the descriptor.

Supported data-type combinations are:

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
desc – [out] This opaque struct gets allocated and filled with the information that encodes the tensor contraction operation.
descA – [in] The descriptor that holds the information about the data type, modes and strides of A.
modeA – [in] Array with ‘nmodeA’ entries that represent the modes of A. The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
opA – [in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.
descB – [in] The descriptor that holds information about the data type, modes, and strides of B.
modeB – [in] Array with ‘nmodeB’ entries that represent the modes of B. The modeB[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
opB – [in] Unary operator that will be applied to each element of B before it is further processed. The original data of this tensor remains unchanged.
modeC – [in] Array with ‘nmodeC’ entries that represent the modes of C. The modeC[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
descC – [in] The escriptor that holds information about the data type, modes, and strides of C.
opC – [in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.
modeD – [in] Array with ‘nmodeD’ entries that represent the modes of D (must be identical to modeC for now). The modeD[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
descD – [in] The descriptor that holds information about the data type, modes, and strides of D (must be identical to descC for now).
descCompute – [in] Determines the precision in which this operations is performed.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported
CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value
CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

`cutensorContract()`#

cutensorStatus_t cutensorContract(

const cutensorHandle_t handle,

const cutensorPlan_t plan,

const void *alpha,

const void *A,

const void *B,

const void *beta,

const void *C,

void *D,

void *workspace,

uint64_t workspaceSize,

cudaStream_t stream,

This routine computes the tensor contraction \( D = alpha * A * B + beta * C \).

\[ \mathcal{D}_{{modes}_\mathcal{D}} \gets \alpha * \mathcal{A}_{{modes}_\mathcal{A}} B_{{modes}_\mathcal{B}} + \beta \mathcal{C}_{{modes}_\mathcal{C}} \]

The active CUDA device must match the CUDA device that was active at the time at which the plan was created.

[Example]

See NVIDIA/CUDALibrarySamples for a concrete example.

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
plan – [in] Opaque handle holding the contraction execution plan (created by cutensorCreateContraction followed by cutensorCreatePlan).
alpha – [in] Scaling for A*B. Its data type is determined by ‘descCompute’ (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE)). Pointer to the host memory.
A – [in] Pointer to the data corresponding to A. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
B – [in] Pointer to the data corresponding to B. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to D.
beta – [in] Scaling for C. Its data type is determined by ‘descCompute’ (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE)). Pointer to the host memory.
C – [in] Pointer to the data corresponding to C. Pointer to the GPU-accessible memory.
D – [out] Pointer to the data corresponding to D. Pointer to the GPU-accessible memory.
workspace – [out] Optional parameter that may be NULL. This pointer provides additional workspace, in device memory, to the library for additional optimizations; the workspace must be aligned to 256 bytes (i.e., the default alignment of cudaMalloc).
workspaceSize – [in] Size of the workspace array in bytes; please refer to cutensorEstimateWorkspaceSize to query the required workspace. While cutensorContract does not strictly require a workspace for the contraction, it is still recommended to provided some small workspace (e.g., 128 MB).
stream – [in] The CUDA stream in which all the computation is performed.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if operation is not supported.
CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).
CUTENSOR_STATUS_SUCCESS – The operation completed successfully.
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.
CUTENSOR_STATUS_ARCH_MISMATCH – if the plan was created for a different device than the currently active device.
CUTENSOR_STATUS_INSUFFICIENT_DRIVER – if the driver is insufficient.
CUTENSOR_STATUS_CUDA_ERROR – if some unknown CUDA error has occurred (e.g., out of memory).

`cutensorCreateContractionTrinary()`#

cutensorStatus_t cutensorCreateContractionTrinary(

const cutensorHandle_t handle,

const cutensorTensorDescriptor_t descA,

const int32_t modeA[],

const cutensorTensorDescriptor_t descB,

const int32_t modeB[],

cutensorOperator_t opB,

const cutensorTensorDescriptor_t descC,

const int32_t modeC[],

const cutensorTensorDescriptor_t descD,

const int32_t modeD[],

cutensorOperator_t opD,

const cutensorTensorDescriptor_t descE,

const int32_t modeE[],

const cutensorComputeDescriptor_t descCompute,

This function allocates a cutensorOperationDescriptor_t object that encodes a tensor contraction of the form \( \mathcal{E} = \alpha \mathcal{A} \mathcal{B} \mathcal{C} + \beta \mathcal{D} \).

Allocates data for desc to be used to perform a tensor contraction of the form

\[ \mathcal{E}_{{modes}_\mathcal{E}} \gets \alpha op_\mathcal{A}(\mathcal{A}_{{modes}_\mathcal{A}}) op_\mathcal{B}(\mathcal{B}_{{modes}_\mathcal{B}}) op_\mathcal{C}(\mathcal{C}_{{modes}_\mathcal{C}}) + \beta op_\mathcal{D}(\mathcal{D}_{{modes}_\mathcal{D}}). \]

See cutensorCreatePlan (or cutensorCreatePlanAutotuned) to create the plan (i.e., to select the kernel) followed by a call to cutensorContractTrinary to perform the actual contraction.

The user is responsible for calling cutensorDestroyOperationDescriptor to free the resources associated with the descriptor.

The performance improvements due to this API are currently especially high if your data resides on the host (i.e. out-of-core), targeting Grace-based systems.

Supported data-type combinations are:

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
desc – [out] This opaque struct gets allocated and filled with the information that encodes the tensor contraction operation.
descA – [in] The descriptor that holds the information about the data type, modes and strides of A.
modeA – [in] Array with ‘nmodeA’ entries that represent the modes of A. The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
opA – [in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.
descB – [in] The descriptor that holds information about the data type, modes, and strides of B.
modeB – [in] Array with ‘nmodeB’ entries that represent the modes of B. The modeB[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
opB – [in] Unary operator that will be applied to each element of B before it is further processed. The original data of this tensor remains unchanged.
descC – [in] The escriptor that holds information about the data type, modes, and strides of C.
modeC – [in] Array with ‘nmodeC’ entries that represent the modes of C. The modeC[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
opC – [in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.
descD – [in] The escriptor that holds information about the data type, modes, and strides of D.
modeD – [in] Array with ‘nmodeD’ entries that represent the modes of D. The modeD[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
opD – [in] Unary operator that will be applied to each element of D before it is further processed. The original data of this tensor remains unchanged.
modeE – [in] Array with ‘nmodeE’ entries that represent the modes of E (must be identical to modeD for now). The modeE[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor.
descE – [in] The descriptor that holds information about the data type, modes, and strides of E (must be identical to descD for now).
descCompute – [in] Determines the precision in which this operations is performed.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported
CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value
CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

`cutensorContractTrinary()`#

cutensorStatus_t cutensorContractTrinary(

const cutensorHandle_t handle,

const cutensorPlan_t plan,

const void *alpha,

const void *A,

const void *B,

const void *C,

const void *beta,

const void *D,

void *E,

void *workspace,

uint64_t workspaceSize,

cudaStream_t stream,

This routine computes the tensor contraction \( E = alpha * A * B * C + beta * D \).

\[ \mathcal{E}_{{modes}_\mathcal{E}} \gets \alpha * \mathcal{A}_{{modes}_\mathcal{A}} \mathcal{B}_{{modes}_\mathcal{B}} \mathcal{C}_{{modes}_\mathcal{C}} + \beta \mathcal{D}_{{modes}_\mathcal{D}} \]

The active CUDA device must match the CUDA device that was active at the time at which the plan was created.

[Example]

See NVIDIA/CUDALibrarySamples for a concrete example.

Parameters:

handle – [in] Opaque handle holding cuTENSOR’s library context.
plan – [in] Opaque handle holding the contraction execution plan (created by cutensorCreateContractionTrinary followed by cutensorCreatePlan).
alpha – [in] Scaling for A*B*C. Its data type is determined by ‘descCompute’ (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE)). Pointer to the host memory.
A – [in] Pointer to the data corresponding to A. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to E.
B – [in] Pointer to the data corresponding to B. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to E.
C – [in] Pointer to the data corresponding to C. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to E.
beta – [in] Scaling for D. Its data type is determined by ‘descCompute’ (see cutensorOperationDescriptorGetAttribute(desc, CUTENSOR_OPERATION_SCALAR_TYPE)). Pointer to the host memory.
D – [in] Pointer to the data corresponding to D. Pointer to the GPU-accessible memory.
E – [out] Pointer to the data corresponding to E. Pointer to the GPU-accessible memory.
workspace – [out] Optional parameter that may be NULL. This pointer provides additional workspace, in device memory, to the library for additional optimizations; the workspace must be aligned to 256 bytes (i.e., the default alignment of cudaMalloc).
workspaceSize – [in] Size of the workspace array in bytes; please refer to cutensorEstimateWorkspaceSize to query the required workspace. While cutensorContract does not strictly require a workspace for the contraction, it is still recommended to provided some small workspace (e.g., 128 MB).
stream – [in] The CUDA stream in which all the computation is performed.

Return values:

CUTENSOR_STATUS_NOT_SUPPORTED – if operation is not supported.
CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).
CUTENSOR_STATUS_SUCCESS – The operation completed successfully.
CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.
CUTENSOR_STATUS_ARCH_MISMATCH – if the plan was created for a different device than the currently active device.
CUTENSOR_STATUS_INSUFFICIENT_DRIVER – if the driver is insufficient.
CUTENSOR_STATUS_CUDA_ERROR – if some unknown CUDA error has occurred (e.g., out of memory).

Reduction Operations#

The following functions perform tensor reductions.

`cutensorCreateReduction()`#

cutensorStatus_t cutensorCreateReduction(

const cutensorHandle_t handle,

const cutensorTensorDescriptor_t descA,

const int32_t modeA[],

const cutensorTensorDescriptor_t descC,

const int32_t modeC[],