Use Dynamically Allocated C++ Arrays in Generated Function Interfaces - MATLAB & Simulink (original) (raw)
In most cases, when you generate code for a MATLAB® function that accepts or returns an array, there is an array at the interface of the generated CUDA® function. For an array size that is unknown at compile time, or whose bound exceeds a predefined threshold, the memory for the generated array is dynamically allocated.
By default, the dynamically allocated array is implemented by using the C styleemxArray
data structure in the generated code. Alternatively, dynamically allocated array can be implemented as a class template calledcoder::gpu_array
in the generated code.coder::gpu_array
offers several advantages overemxArray
style data structures:
- The generated code is exception safe.
- Generated code is easier to read.
- Better C++ integration because of ease of initializing the input data and working with the output data.
- Because
coder::gpu_array
is defined in a header file that ships with MATLAB, you can write the interface code before the generating code.
To use dynamically allocated arrays in your custom CUDA code that you integrate with the generated CUDA C++ functions, learn to use the coder::gpu_array
template.
Change Interface Generation
By default, the generated CUDA code uses the C style emxArray
data structure to implement dynamically allocated arrays. Instead, you can choose to generate CUDA code that uses the coder::gpu_array
template to implement dynamically allocated arrays. To generate the coder::gpu_array
template, do one of the following:
- In a code configuration object (coder.MexCodeConfig, coder.CodeConfig, or coder.EmbeddedCodeConfig), set the
DynamicMemoryAllocationInterface
parameter to'C++'
. - In the GPU Coder™ app settings, on the Memory tab, setDynamic memory allocation interface to
Use C++ coder::array
.
Using the coder::gpu_array
Class Template
When you generate CUDA code for your MATLAB functions, the code generator produces header filescoder_gpu_array.h
and coder::array.h
in the build folder. The coder_gpu_array.h
header file contains the definition of the class template gpu_array
in the namespace coder
and the definitions for the function templates arrayCopyCpuToGpu
andarrayCopyGpuToCpu
. The coder::gpu_array
template implements the dynamically allocated arrays in the generated code. The declaration for this template is:
template <typename T, int32_T N> class gpu_array
The array contains elements of type T
and has N
dimensions. For example, to declare a two-dimensional dynamic arraymyArray
that contains elements of type int32_T
in your custom CUDA code, use:
coder::gpu_array<int32_T, 2> myArray
The function templates arrayCopyCpuToGpu
andarrayCopyGpuToCpu
implement data transfers between the CPU and GPU memory. On the CPU, the dynamically allocated arrays are implemented by using thecoder::array
template. For more information on the APIs you use to create and interact with dynamic arrays in your custom code, see Use Dynamically Allocated C++ Arrays in Generated Function Interfaces.
To use dynamically allocated arrays in your custom CUDA code that you want to integrate with the generated code (for example, a custom main function), include the coder_gpu_array.h
andcoder_array.h
header files in your custom .cu
files.
Generate C++ Code That Accepts and Returns a Variable-Size Numeric Array
This examples shows how to customize the generated example main function to use thecoder::gpu_array
and coder::array
class templates in your project.
Your goal is to generate a CUDA executable for xTest1
that can accept and return an array of int32_T
elements. You want the first dimension of the array to be singleton and the second dimension to be unbounded.
- Define a MATLAB function
xTest1
that accepts an arrayX
, adds the scalarA
to each of its elements, and returns the resulting arrayY
.
function Y = xTest1(X, A)
Y = X;
for i = 1:numel(X)
Y(i) = X(i) + A;
end - Generate initial source code for
xTest1
and movexTest1.h
from the code generation folder to your current folder. Use the following commands:
cfg = coder.gpuConfig('lib');
cfg.DynamicMemoryAllocationInterface = 'C++';
cfg.GenerateReport = true;
inputs = {coder.typeof(int32(0), [1 inf]), int32(0)};
codegen -config cfg -args inputs xTest1.m
The function prototype forxTest1
in the generated code is shown here:
extern void xTest1(const coder::array<int, 2U> &X, int A,
coder::array<int, 2U> &Y);
Interface the generated code by providing input and output arrays that are compatible with the function prototype shown above.
3. Define a CUDA main function in the file xTest1_main.cu
in your current working folder.
This main function includes the header files coder_gpu_array.h
and coder_array.h
that contain thecoder::gpu_array
and coder::array
class template definitions respectively. The main function performs these actions:
- Declare
myArray
andmyResult
as two-dimensionalcoder::array
dynamic arrays ofint32_T
elements. - Dynamically set the sizes of the two dimensions of
myArray
to1
and100
by using theset_size
method. - Access the size vector of
myResult
by usingmyResult.size
.
#include
#include<coder_array.h>
#include<xTest1.h>
int main(int argc, char *argv[])
{
static_cast(argc);
static_cast(argv);
// Instantiate the input variable by using coder::array template
coder::array<int32_T, 2> myArray;
// Allocate initial memory for the array
myArray.set_size(1, 100);
// Access array with standard C++ indexing
for (int i = 0; i < myArray.size(1); i++) {
myArray[i] = i;
}
// Instantiate the result variable by using coder::array template
coder::array<int32_T, 2> myResult;
// Pass the input and result arrays to the generated function
xTest1(myArray, 1000, myResult);
for (int i = 0; i < myResult.size(1); i++) {
if (i > 0) std::cout << " ";
std::cout << myResult[i];
if (((i+1) % 10) == 0) std::cout << std::endl;
}
std::cout << std::endl;
return 0;
}
4. Generate code by running this script:
cfg = coder.gpuConfig('exe');
cfg.DynamicMemoryAllocationInterface = 'C++';
cfg.GenerateReport = true;
cfg.CustomSource = 'xTest1_main.cu';
cfg.CustomInclude = '.';
codegen -config cfg -args inputs xTest1_main.cu xTest1.m
5. The code generator produces an executable file xTest1
in your current working folder. Run the executable using the following commands:
if ispc
!xtest1.exe
else
!./xTest1
end
1000 1001 1002 1003 1004 1005 1006 1007 1008 1009
1010 1011 1012 1013 1014 1015 1016 1017 1018 1019
1020 1021 1022 1023 1024 1025 1026 1027 1028 1029
1030 1031 1032 1033 1034 1035 1036 1037 1038 1039
1040 1041 1042 1043 1044 1045 1046 1047 1048 1049
1050 1051 1052 1053 1054 1055 1056 1057 1058 1059
1060 1061 1062 1063 1064 1065 1066 1067 1068 1069
1070 1071 1072 1073 1074 1075 1076 1077 1078 1079
1080 1081 1082 1083 1084 1085 1086 1087 1088 1089
1090 1091 1092 1093 1094 1095 1096 1097 1098 1099
Limitations
- For generating CUDA code that uses
coder::gpu_array
, the GPU memory allocation mode must be set todiscrete
.
To change the memory allocation mode in the GPU Coder app settings, in the GPU Code section, use theMalloc mode parameter. When using the command-line interface, use theMallocMode
build configuration property and set it to either'discrete'
or'unified'
. - GPU Coder does not support
coder::gpu_array
in Simulink®.