gpucoder.matrixMatrixKernel - Optimized GPU implementation of functions containing matrix-matrix

        operations - MATLAB ([original](https://www.mathworks.com/help/gpucoder/ref/gpucoder.matrixmatrixkernel.html)) ([raw](?raw))

Optimized GPU implementation of functions containing matrix-matrix operations

Syntax

Description

[C](#mw%5F51423813-2f02-4906-8d4f-e9387d3327d0) = gpucoder.matrixMatrixKernel([fun](#mw%5F2bf6f8ab-cdee-4123-bd53-1ee2f40255e6),[A](#mw%5F9b1e5a56-30f0-4603-82eb-5f7c93469cc2),[B](#mw%5F9b1e5a56-30f0-4603-82eb-5f7c93469cc2)) generates kernels from functions that contain GEMM-like operations. For example, matching feature points between two images by using:

[C](#mw%5F51423813-2f02-4906-8d4f-e9387d3327d0) = gpucoder.matrixMatrixKernel(___,[orientation](#mw%5F5f6c507c-a539-43f5-8473-64ff4f2e866e)) has the optional argument orientation that specifies the orientation of A and B matrices.

[C](#mw%5F51423813-2f02-4906-8d4f-e9387d3327d0) = gpucoder.matrixMatrixKernel(___,[vectorizedSim](#mw%5Fac8695c4-dd0c-4824-89a0-23bc9956da6d)) has the optional argument vectorizedSim that specifies use of vectorized operations during MATLAB® simulation and CPU code generation. The function handlefun must support vector inputs and take one row or column from A and one column or row from B and outputs a vector equivalent to arrayfun(FUN, A, B).

example

Examples

collapse all

Matrix-Matrix Multiplication

This example performs a simple matrix-matrix multiplication and uses the matrixMatrixKernel design pattern to generate CUDA® code.

In one file, write an entry-point function matMul_nn that accepts two matrix inputs f1 andf2. Use the MATLAB function @times to multiplyf1 and f2 element by element. The sign @ creates a handle to the function times. Insert the gpucoder.matrixMatrixKernel() statement. The input matrices are not transposed, therefore use the 'nn' option.

function scores = matMul_nn(f1, f2) scores = gpucoder.matrixMatrixKernel(@times, f1, f2, 'nn',true); end

Use the codegen function to generate CUDA MEX function.

codegen -config coder.gpuConfig('mex') ... -args {ones(1024,1024,'double'),ones(1024,1024,'double')} ... -report matMul_nn

The generated CUDA code contains two kernels:matMul_nn_kernel1 for initializing the output matrixscores and matrixMatrixKernel that performs the times operation. The following is a snippet of the generated code.

cudaMemcpy(*gpu_f2, cpu_f2, 8388608UL, cudaMemcpyHostToDevice); matMul_nn_kernel1<<<dim3(2048U, 1U, 1U), dim3(512U, 1U, 1U)>>>(*gpu_f2, *gpu_B); cudaMemcpy(*gpu_f1, cpu_f1, 8388608UL, cudaMemcpyHostToDevice); matrixMatrixKernel<<<1024U, 64U>>>(*gpu_f1, *gpu_B, *gpu_scores); cudaMemcpy(cpu_scores, *gpu_scores, 8388608UL, cudaMemcpyDeviceToHost);

Input Arguments

collapse all

fun — Function to apply

function handle

Function to apply to the elements of the input arrays, specified as a function handle. fun is a handle to a user-defined function. It takes one row or column from matrix A and one row or column from matrix B and outputs a vector with the same type as the input. The output vector is then summed to compute a single scalar value in C.

Data Types: function_handle

A, B — Input array

array

Numeric inputs A and B must be either of the same size or have sizes that are compatible. For example, ifA is an M-by-K matrix, B is aK-by-N matrix thenC is an M-by-N matrix.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | half

orientation — Operation performed on input matrices

'NN' (default) | character vector | string

Character vector or string composed of two characters, indicating the operation performed on the matrices A andB prior to matrix multiplication. Possible values are normal ('N'), transposed ('T'), or complex conjugate transpose ('C').

Possible values are:

vectorizedSim — Use vectorized operation

false (default) | true

Specify whether to use vectorized operation during MATLAB simulation and CPU code generation.

Output Arguments

collapse all

C — Output Array

scalar | vector | matrix

Product, returned as a scalar, vector, or matrix. ArrayD has the same number of rows as inputA and the same number of columns as inputB.

Version History

Introduced in R2017b

See Also

Apps

Functions

Objects

Topics