Call Custom CUDA Kernels from the Generated Code - MATLAB & Simulink (original) (raw)
From within your MATLAB® code, you can directly call external CUDA® kernels, also called custom code or legacy code. To call CUDA kernels, use coder.ceval. The code generator integrates your CUDA kernel into the CUDA code generated from MATLAB. Integrate code when there are external libraries, optimized code, or object files developed using CUDA that you want to use with your generated code.
The external CUDA kernel must use the __global__
qualifier to execute the function (kernels) on the GPU device and to call the function from the host or from the device. Functions with the __device__
qualifier are called device functions. The device functions are different from global functions in that they can only be called from other device or global functions. For information on integrating custom device functions, see Call Custom CUDA Device Function from the Generated Code.
Note
Use coder.ceval only in MATLAB code intended for code generation. coder.ceval generates an error in uncompiled MATLAB code. To determine if a MATLAB function is executing in MATLAB, use coder.target. If the function is executing in MATLAB, call the MATLAB version of the CUDA kernel.
Call Custom CUDA Kernel
This example shows how to integrate a simple CUDA kernel with MATLAB code by using coder.ceval. Consider the MATLAB function, saxpy
:
function y = saxpy(a,x,y) y = a*x + y; end
For this example, suppose that you want to implement the a x plus y operation by using external CUDA kernel. Consider the CUDA kernel, saxpy_kernel
, implemented in the file saxpy.cu
:
#include "saxpy.h"
global void saxpy_kernel(uint32_T n, real32_T a, real32_T x, real32_T y) { int i = blockIdx.xblockDim.x + threadIdx.x; if (i < n) y[i] = ax[i] + y[i]; }
To integrate saxpy_kernel
with your MATLAB code, you need a header file that contains the function prototype. See the file saxpy.h
:
#ifndef real32_T #define real32_T float #define uint32_T unsigned int #endif
#define saxpy(grid,block,n,a,x,y) saxpy_kernel<<<grid,block>>>(n,a,x,y)
global void saxpy_kernel(uint32_T n, real32_T a, real32_T *x, real32_T *y);
This example generates CUDA MEX, uint32_T
and real32_T
are custom types used in the generated MEX code. The code generator produces data types in CUDA code that correspond to the data types that you use in your MATLAB code. The data types that are generated depend on the target platform and compiler. The code generator can produce either built-in CUDA/C++ data types, such as short, long, int, and so on, or custom data types defined by using typedef statements. By default, the code generator produces built-in types for standalone code (lib, dll, or exe) and custom types for MEX code. For more information, see Mapping MATLAB Types to Types in Generated Code.
Entry-Point Function
Use the coder.ceval
command to call the CUDA kernel in the saxpyRef.m
entry-point function. Use coder.ceval
only in MATLAB code intended for code generation. The coder.rref
and coder.ref
commands instruct the code generator to pass pointers to the arrays, rather than copy them.
function y = saxpyRef(a,x,y) % saxpyRef Entry-point function for computing single-precision % (A*X) Plus Y
% Copyright 2022 The MathWorks, Inc. coder.gpu.kernelfun;
if coder.target('MATLAB') y = a*x + y; else coder.ceval('saxpy', uint32(floor((numel(x)+255)/256)), ... uint32(256),uint32(numel(x)), single(a), ... coder.rref(x,'gpu'),coder.ref(y,'gpu')); end end
Generate CUDA Code
To generate CUDA code, create a GPU code configuration object. Specify the location of the custom CUDA files by setting custom code properties on the configuration object. For more information, see Configure Build for External C/C++ Code.
cfg = coder.gpuConfig("mex"); cfg.GenerateReport = true; cfg.CustomSource = "saxpy.cu"; cfg.CustomInclude = pwd; cfg.CustomSourceCode = '#include "saxpy.h"';
aType = coder.newtype('single', [1 1], [0 0]); xType = coder.newtype('single', [4096 256], [0 0]); yType = coder.newtype('single', [4096 256], [0 0]); inputArgs = {aType,xType,yType};
codegen -config cfg saxpyRef -args inputArgs
Code generation successful: View report
Generated Code
To compare your generated CUDA code to the original MATLAB code, open the CUDA file, saxyRef.cu
in the work
\codegen\mex\saxpyRef
folder.
#include "saxpy.h" // Function Definitions void saxpyRef(real32_T a, const real32_T x[1048576], real32_T y[1048576]) { real32_T(*gpu_x)[1048576]; real32_T(gpu_y)[1048576]; cudaMalloc(&gpu_y, 4194304UL); cudaMalloc(&gpu_x, 4194304UL); // saxpyRef Entry-point function for computing single-precision (AX) Plus // Y // Copyright 2022 The MathWorks, Inc. cudaMemcpy(*gpu_x, x, 4194304UL, cudaMemcpyHostToDevice); cudaMemcpy(*gpu_y, y, 4194304UL, cudaMemcpyHostToDevice); saxpy(4096U, 256U, 1048576U, a, &(*gpu_x)[0], &(*gpu_y)[0]); cudaMemcpy(y, *gpu_y, 4194304UL, cudaMemcpyDeviceToHost); cudaFree(*gpu_x); cudaFree(*gpu_y); }
Run Generated MEX
Run the generated MEX with random inputs and compare the results with MATLAB simulation.
a = single(15); x = randi(10,4096,256,'single'); y = zeros(4096,256,'single');
yMATLAB = saxpyRef(a,x,y); yGPU = saxpyRef_mex(a,x,y);
if (yGPU - yMATLAB == 0) fprintf('\nMATLAB simulation matches GPU execution.\n'); end
MATLAB simulation matches GPU execution.