gpucoder.stencilKernel - (Not recommended) Create CUDA code for stencil functions - MATLAB (original) (raw)

(Not recommended) Create CUDA code for stencil functions

Syntax

Description

B = gpucoder.stencilKernel(FUN,A,[M N],shape,param1,param2...) applies the function FUN to each [M,N] sliding window of the input A. Function FUN is called for each [M,N] submatrix of A and computes an element of output B. The index of this element corresponds to the center of the [M,N] window.

FUN is the handle to a user-defined function that returns a scalar output of the same type as the input.

C= FUN(X,param1,param2, ...)

X is the [M,N] submatrix of the original input A. X can be zero-padded when necessary, for instance at the boundaries of inputA. X and the window can also be 1-D.

C is a scalar valued output of FUN. It is the output computed for the center element of the [M,N] arrayX and is assigned to the corresponding element of the output array B.

param1,param2 are optional arguments. Pass these arguments ifFUN requires additional parameters in addition to the input window.

The window [M,N] must be less than or equal to the size ofA, with the same shape as A.

If A is 1-D row vector, the window must be[1,N].

If A is 1-D column vector, the window must be[N,1].

shape determines the size of the output arrayB. It can have one of three possible values:

The input A must be a vector or matrix with a numeric type supported by FUN. The class of B is the same as the class of A.

Code generation is supported only for fixed size outputs. Shape and window must be compile-time constants because they determine the size of the output.

example

Examples

collapse all

Mean Filter Using Stencil Kernel

This example shows how to use thegpucoder.stencilKernel and generate CUDA® kernels that perform filtering of an image by using stencil operations.

This example performs mean filtering of a 2-D image. In one file, write the entry-point function test that accepts an image matrix A. Create a subfunctionmy_mean that computes the mean of the3x3 submatrix.

function B = meanImgFilt(A) %#codegen B = gpucoder.stencilKernel(@my_mean,A,[3 3],'same');

function out = my_mean(A) out = cast(mean(A(:)), class(A)); end end

Set up the test input image for the meanImgFilt function.

inImage = im2double(imread('cameraman.tif'));

Use the codegen function to generate CUDA MEX function.

codegen -config coder.gpuConfig('mex') -args {inImage} -report meanImgFilt

GPU Coder creates three kernels: meanImgFilt_kernel1 for initializing memory, meanImgFilt_kernel2 for optimizing the input memory structure, andmeanImgFilt_kernel3 for mean filtering operation. The following is a snippet of the generated code.

cudaMalloc(&gpu_B, 524288ULL); cudaMalloc(&gpu_A, 524288ULL); cudaMalloc(&gpu_expanded, 532512ULL); meanImgFilt_kernel1<<<dim3(131U, 1U, 1U), dim3(512U, 1U, 1U)>>>(gpu_expanded); cudaMemcpy((void *)gpu_A, (void *)&A[0], 524288ULL, cudaMemcpyHostToDevice); meanImgFilt_kernel2<<<dim3(128U, 1U, 1U), dim3(512U, 1U, 1U)>>>(gpu_A, gpu_expanded); meanImgFilt_kernel3<<<dim3(8U, 8U, 1U), dim3(32U, 32U, 1U)>>>(gpu_expanded, gpu_B); cudaMemcpy((void *)&B[0], (void *)gpu_B, 524288ULL, cudaMemcpyDeviceToHost);

meanImgFilt_kernel3 uses shared memory (__shared__ qualifier) to improve memory bandwidth and data locality.

Limitations

Version History

Introduced in R2017b

collapse all

gpucoder.stencilKernel is not recommended. Use stencilfun instead.

Starting in R2022b, generate CUDA kernels for stencil like operations by usingstencilfun function.

This table shows typical usages of gpucoder.stencilKernel and how to update your code to use stencilfun instead.

Not Recommended Recommended
Convolution usinggpucoder.stencilKernel:function Out = myconv(In) Out = gpucoder.stencilKernel(@stencilFcn, In, [5 5], 'same'); end function y = stencilFcn(X) W = rand(5); y = 0; for j = 1:5 for i = 1:5 y = y + X(i,j) * W(i,j); end end end Convolution usingstencilfun:function Out = myconv(In) fh = @(X) stencilFcn(X); Out = stencilfun(fh, In, [5 5], Shape = 'same'); end function y = stencilFcn(X) W = rand(5) y = 0; for j = 1:5 for i = 1:5 y = y + X(i,j) * W(i,j); end end end
Passing extra arguments to the stencil function.weights = rand(5); In = rand(100); Out = gpucoder.stencilKernel(@stencilFcn, In, [5 5],'same',weights); function y = stencilFcn(X, weights) y = 0; for i = 1 : 5 for j = 1 : 5 y = y + X(j,i) * weights(j,i); end end end Use anonymous function to pass extra arguments to the stencil function.weights = rand(5); fh = @(X) stencilFcn(X, weights); In = rand(100); Out = stencilfun(fh, In, [5 5],Shape='same'); function y = stencilFcn(X, weights) y = 0; for i = 1 : 5 for j = 1 : 5 y = y + X(j,i) * weights(j,i); end end end

See Also

Apps

Functions

Objects

Topics