gpucoder.matrixMatrixKernel - Optimized GPU implementation of functions containing matrix-matrix
operations - MATLAB ([original](https://www.mathworks.com/help/gpucoder/ref/gpucoder.matrixmatrixkernel.html)) ([raw](?raw))
Optimized GPU implementation of functions containing matrix-matrix operations
Syntax
Description
[C](#mw%5F51423813-2f02-4906-8d4f-e9387d3327d0) = gpucoder.matrixMatrixKernel([fun](#mw%5F2bf6f8ab-cdee-4123-bd53-1ee2f40255e6),[A](#mw%5F9b1e5a56-30f0-4603-82eb-5f7c93469cc2),[B](#mw%5F9b1e5a56-30f0-4603-82eb-5f7c93469cc2))
generates kernels from functions that contain GEMM-like operations. For example, matching feature points between two images by using:
- The sum of absolute differences (SAD) —
F() = @(a,b)abs(a-b)
- The sum of squared differences (SSD) —
F() = @(a,b)(a-b).*(a-b)
[C](#mw%5F51423813-2f02-4906-8d4f-e9387d3327d0) = gpucoder.matrixMatrixKernel(___,[orientation](#mw%5F5f6c507c-a539-43f5-8473-64ff4f2e866e))
has the optional argument orientation
that specifies the orientation of A
and B
matrices.
[C](#mw%5F51423813-2f02-4906-8d4f-e9387d3327d0) = gpucoder.matrixMatrixKernel(___,[vectorizedSim](#mw%5Fac8695c4-dd0c-4824-89a0-23bc9956da6d))
has the optional argument vectorizedSim
that specifies use of vectorized operations during MATLAB® simulation and CPU code generation. The function handlefun
must support vector inputs and take one row or column from A
and one column or row from B
and outputs a vector equivalent to arrayfun(FUN, A, B)
.
Examples
Matrix-Matrix Multiplication
This example performs a simple matrix-matrix multiplication and uses the matrixMatrixKernel
design pattern to generate CUDA® code.
In one file, write an entry-point function matMul_nn
that accepts two matrix inputs f1
andf2
. Use the MATLAB function @times
to multiplyf1
and f2
element by element. The sign @ creates a handle to the function times
. Insert the gpucoder.matrixMatrixKernel()
statement. The input matrices are not transposed, therefore use the 'nn'
option.
function scores = matMul_nn(f1, f2) scores = gpucoder.matrixMatrixKernel(@times, f1, f2, 'nn',true); end
Use the codegen function to generate CUDA MEX function.
codegen -config coder.gpuConfig('mex') ... -args {ones(1024,1024,'double'),ones(1024,1024,'double')} ... -report matMul_nn
The generated CUDA code contains two kernels:matMul_nn_kernel1
for initializing the output matrixscores
and matrixMatrixKernel
that performs the times
operation. The following is a snippet of the generated code.
cudaMemcpy(*gpu_f2, cpu_f2, 8388608UL, cudaMemcpyHostToDevice); matMul_nn_kernel1<<<dim3(2048U, 1U, 1U), dim3(512U, 1U, 1U)>>>(*gpu_f2, *gpu_B); cudaMemcpy(*gpu_f1, cpu_f1, 8388608UL, cudaMemcpyHostToDevice); matrixMatrixKernel<<<1024U, 64U>>>(*gpu_f1, *gpu_B, *gpu_scores); cudaMemcpy(cpu_scores, *gpu_scores, 8388608UL, cudaMemcpyDeviceToHost);
Input Arguments
fun
— Function to apply
function handle
Function to apply to the elements of the input arrays, specified as a function handle. fun
is a handle to a user-defined function. It takes one row or column from matrix A
and one row or column from matrix B
and outputs a vector with the same type as the input. The output vector is then summed to compute a single scalar value in C
.
Data Types: function_handle
A
, B
— Input array
array
Numeric inputs A
and B
must be either of the same size or have sizes that are compatible. For example, ifA
is an M
-by-K
matrix, B
is aK
-by-N
matrix thenC
is an M
-by-N
matrix.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| half
orientation
— Operation performed on input matrices
'NN' (default) | character vector | string
Character vector or string composed of two characters, indicating the operation performed on the matrices A
andB
prior to matrix multiplication. Possible values are normal ('N'
), transposed ('T'
), or complex conjugate transpose ('C'
).
Possible values are:
'nn'
- MatricesA
andB
are normal.'nt'
- MatrixB
is transposed.'tn'
- MatrixA
is transposed.'tt'
- Both matricesA
andB
are transposed.
vectorizedSim
— Use vectorized operation
false (default) | true
Specify whether to use vectorized operation during MATLAB simulation and CPU code generation.
Output Arguments
C
— Output Array
scalar | vector | matrix
Product, returned as a scalar, vector, or matrix. ArrayD
has the same number of rows as inputA
and the same number of columns as inputB
.
Version History
Introduced in R2017b
See Also
Apps
Functions
- codegen | coder.gpu.kernel | coder.gpu.kernelfun | gpucoder.stencilKernel | coder.gpu.constantMemory | coder.gpu.nokernel | gpucoder.batchedMatrixMultiply | gpucoder.stridedMatrixMultiply | gpucoder.batchedMatrixMultiplyAdd | gpucoder.stridedMatrixMultiplyAdd