gpucoder.stencilKernel - (Not recommended) Create CUDA code for stencil functions - MATLAB (original) (raw)
(Not recommended) Create CUDA code for stencil functions
Syntax
Description
B = gpucoder.stencilKernel(FUN,A,[M N],shape,param1,param2...)
applies the function FUN
to each [M,N]
sliding window of the input A
. Function FUN
is called for each [M,N]
submatrix of A
and computes an element of output B
. The index of this element corresponds to the center of the [M,N]
window.
FUN
is the handle to a user-defined function that returns a scalar output of the same type as the input.
C= FUN(X,param1,param2, ...)
X
is the [M,N]
submatrix of the original input A
. X
can be zero-padded when necessary, for instance at the boundaries of inputA
. X
and the window can also be 1-D.
C
is a scalar valued output of FUN
. It is the output computed for the center element of the [M,N]
arrayX
and is assigned to the corresponding element of the output array B
.
param1,param2
are optional arguments. Pass these arguments ifFUN
requires additional parameters in addition to the input window.
The window [M,N]
must be less than or equal to the size ofA
, with the same shape as A
.
If A
is 1-D row vector, the window must be[1,N]
.
If A
is 1-D column vector, the window must be[N,1]
.
shape
determines the size of the output arrayB
. It can have one of three possible values:
'same'
- Returns outputB
that is the same size asA
.'full'
- (default) Returns the full output. Size ofB
> size ofA
, that is, ifA
is of size (x,y). Size ofB = [x + floor(M/2), y + floor(N/2)]
'valid'
- Returns only those parts of the output that are computed without the zero-padded edges ofA
. Size ofB = [x - floor(M/2), y - floor(N/2)]
The input A
must be a vector or matrix with a numeric type supported by FUN
. The class of B
is the same as the class of A
.
Code generation is supported only for fixed size outputs. Shape and window must be compile-time constants because they determine the size of the output.
Examples
Mean Filter Using Stencil Kernel
This example shows how to use thegpucoder.stencilKernel
and generate CUDA® kernels that perform filtering of an image by using stencil operations.
This example performs mean filtering of a 2-D image. In one file, write the entry-point function test
that accepts an image matrix A
. Create a subfunctionmy_mean
that computes the mean of the3x3
submatrix.
function B = meanImgFilt(A) %#codegen B = gpucoder.stencilKernel(@my_mean,A,[3 3],'same');
function out = my_mean(A) out = cast(mean(A(:)), class(A)); end end
Set up the test input image for the meanImgFilt
function.
inImage = im2double(imread('cameraman.tif'));
Use the codegen function to generate CUDA MEX function.
codegen -config coder.gpuConfig('mex') -args {inImage} -report meanImgFilt
GPU Coder creates three kernels: meanImgFilt_kernel1
for initializing memory, meanImgFilt_kernel2
for optimizing the input memory structure, andmeanImgFilt_kernel3
for mean filtering operation. The following is a snippet of the generated code.
cudaMalloc(&gpu_B, 524288ULL); cudaMalloc(&gpu_A, 524288ULL); cudaMalloc(&gpu_expanded, 532512ULL); meanImgFilt_kernel1<<<dim3(131U, 1U, 1U), dim3(512U, 1U, 1U)>>>(gpu_expanded); cudaMemcpy((void *)gpu_A, (void *)&A[0], 524288ULL, cudaMemcpyHostToDevice); meanImgFilt_kernel2<<<dim3(128U, 1U, 1U), dim3(512U, 1U, 1U)>>>(gpu_A, gpu_expanded); meanImgFilt_kernel3<<<dim3(8U, 8U, 1U), dim3(32U, 32U, 1U)>>>(gpu_expanded, gpu_B); cudaMemcpy((void *)&B[0], (void *)gpu_B, 524288ULL, cudaMemcpyDeviceToHost);
meanImgFilt_kernel3
uses shared memory (__shared__
qualifier) to improve memory bandwidth and data locality.
Limitations
- For very large input sizes, the
gpucoder.stencilKernel
function may produce CUDA code that does not numerically match the MATLAB® simulation. In such cases, consider reducing the size of the input to produce accurate results.
Version History
Introduced in R2017b
R2022b: gpucoder.stencilKernel
is not recommended
gpucoder.stencilKernel
is not recommended. Use stencilfun instead.
Starting in R2022b, generate CUDA kernels for stencil like operations by usingstencilfun
function.
This table shows typical usages of gpucoder.stencilKernel
and how to update your code to use stencilfun
instead.
Not Recommended | Recommended |
---|---|
Convolution usinggpucoder.stencilKernel:function Out = myconv(In) Out = gpucoder.stencilKernel(@stencilFcn, In, [5 5], 'same'); end function y = stencilFcn(X) W = rand(5); y = 0; for j = 1:5 for i = 1:5 y = y + X(i,j) * W(i,j); end end end | Convolution usingstencilfun:function Out = myconv(In) fh = @(X) stencilFcn(X); Out = stencilfun(fh, In, [5 5], Shape = 'same'); end function y = stencilFcn(X) W = rand(5) y = 0; for j = 1:5 for i = 1:5 y = y + X(i,j) * W(i,j); end end end |
Passing extra arguments to the stencil function.weights = rand(5); In = rand(100); Out = gpucoder.stencilKernel(@stencilFcn, In, [5 5],'same',weights); function y = stencilFcn(X, weights) y = 0; for i = 1 : 5 for j = 1 : 5 y = y + X(j,i) * weights(j,i); end end end | Use anonymous function to pass extra arguments to the stencil function.weights = rand(5); fh = @(X) stencilFcn(X, weights); In = rand(100); Out = stencilfun(fh, In, [5 5],Shape='same'); function y = stencilFcn(X, weights) y = 0; for i = 1 : 5 for j = 1 : 5 y = y + X(j,i) * weights(j,i); end end end |
See Also
Apps
Functions
- codegen | coder.gpu.kernel | gpucoder.matrixMatrixKernel | coder.gpu.constantMemory | gpucoder.reduce | gpucoder.sort | coder.gpu.nokernel