Code Generation for Deep Learning Networks by Using TensorRT - MATLAB & Simulink (original) (raw)
With GPU Coder™, you can generate optimized code for prediction of a variety of trained deep learning networks from Deep Learning Toolbox™. The generated code implements the deep convolutional neural network (CNN) by using the architecture, the layers, and parameters that you specify in the input SeriesNetwork (Deep Learning Toolbox) orDAGNetwork (Deep Learning Toolbox) object. You can configure the code generator to take advantage of the NVIDIA® TensorRT™ high performance inference library for NVIDIA GPUs. TensorRT provides improved latency, throughput, and memory efficiency by combining network layers and optimizing kernel selection. You can also configure the code generator to take advantage TensorRT's precision modes (FP32, FP16, or INT8) to further improve performance and reduce memory requirements. The generated code can be integrated into your project as source code, static or dynamic libraries, or executables that you can deploy to a variety of NVIDIA GPU platforms.
Note
The TensorRT work flow is not supported on MATLAB® Online™.
Generate code for convolutional networks by using one of the methods:
- The standard codegen function that generates CUDA® code from a MATLAB entry-point function.
- The GPU Coder app that generates CUDA code from a MATLAB entry-point function.
Note
In previous releases you could target the TensorRT library by using the cnncodegen function. From R2021a onwards, the cnncodegen
function generates C++ code and make files for only the ARM® Mali GPU processor.
Generate Code and Classify Images by Using GoogLeNet
In this example, you use GPU Coder to generate CUDA code for the pretrained googlenet (Deep Learning Toolbox) deep convolutional neural network and classify an image. GoogLeNet has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee mug, pencil, and animals). The network has learned rich feature representations for a wide range of images. The network takes an image as input, and then outputs a label for the object in the image with the probabilities for each of the object categories. This example shows you how to generate code for the pretrained network by using thecodegen
command and the GPU Coder app.
This example uses 32-bit floats (default value) as the precision for the tensor inputs. To learn more about using 8-bit integer precision for the tensors, see the Deep Learning Prediction with NVIDIA TensorRT Library example.
Requirements
Required
This example generates CUDA MEX that has the following additional requirements.
- Deep Learning Toolbox.
- Deep Learning Toolbox Model for GoogLeNet Network support package.
- GPU Coder Interface for Deep Learning support package.
- CUDA enabled NVIDIA GPU and a compatible driver. For 8-bit integer precision, the CUDA GPU must have a compute capability of 6.1, 7.0 or higher. Half-precision requires a CUDA GPU with minimum compute capability of 5.3, 6.0, 6.2 or higher.
Optional
For non-MEX builds such as static, dynamic libraries, or executables, this example has the following additional requirements.
- CUDA Toolkit, cuDNN, and TensorRT libraries. For information on the supported versions of the compilers and libraries, see Installing Prerequisite Products.
- Environment variables for the compilers and libraries. For more information, seeEnvironment Variables.
Load Pretrained Network
- Load the pretrained GoogLeNet network. You can choose to load a different pretrained network for image classification. If you do not have the required support packages installed, the software provides a download link.
- The object
net
contains theDAGNetwork
object. Use the analyzeNetwork (Deep Learning Toolbox) function to display an interactive visualization of the network architecture, to detect errors and issues in the network, and to display detailed information about the network layers. The layer information includes the sizes of layer activations and learnable parameters, the total number of learnable parameters, and the sizes of state parameters of recurrent layers. - The image that you want to classify must have the same size as the input size of the network. For GoogLeNet, the size of the imageInputLayer (Deep Learning Toolbox) is 224-by-224-by-3. The
Classes
property of the output ClassificationOutputLayer (Deep Learning Toolbox) contains the names of the classes learned by the network. View 10 random class names out of the total of 1000.
classNames = net.Layers(end).Classes;
numClasses = numel(classNames);
disp(classNames(randperm(numClasses,10)))
'speedboat'
'window screen'
'isopod'
'wooden spoon'
'lipstick'
'drake'
'hyena'
'dumbbell'
'strawberry'
'custard apple'
For more information, see List of Deep Learning Layers (Deep Learning Toolbox).
Create an Entry-Point Function
- Write an entry-point function in MATLAB that:
- Uses the coder.loadDeepLearningNetwork function to load a deep learning model and to construct and set up a CNN class. For more information, see Load Pretrained Networks for Code Generation.
- Calls predict (Deep Learning Toolbox) to predict the responses.
- For example:
function out = googlenet_predict(in) %#codegen
persistent mynet;
if isempty(mynet)
mynet = coder.loadDeepLearningNetwork('googlenet');
end
% pass in input
out = predict(mynet,in);
A persistent objectmynet
loads theDAGNetwork
object. At the first call to the entry-point function, the persistent object is constructed and set up. On subsequent calls to the function, the same object is reused to callpredict
on inputs, avoiding reconstructing and reloading the network object.
Note
Code generation requires the network to be loaded into a persistent object. - You can also use the activations (Deep Learning Toolbox) method to network activations for a specific layer. For example, the following line of code returns the network activations for the layer specified in
layerIdx
.
out = activations(mynet,in,layerIdx,'OutputAs','Channels'); - You can also use the classify (Deep Learning Toolbox) method to predict class labels for the image data in
in
using the trained network,mynet
.
[out,scores] = classify(mynet,in);
For LSTM networks, you can also use the predictAndUpdateState (Deep Learning Toolbox) and resetState (Deep Learning Toolbox) methods. For usage notes and limitations of these method, see the corresponding entry in the Supported Functions table.
Code Generation by Using codegen
- To configure build settings such as output file name, location, and type, you create coder configuration objects. To create the objects, use the coder.gpuConfig function. For example, when generating CUDA MEX by using the
codegen
command, usecfg = coder.gpuConfig('mex');
Other available options are:cfg = coder.gpuConfig('lib');
, to create a code generation configuration object for use withcodegen
when generating a CUDA C/C++ static library.cfg = coder.gpuConfig('dll');
, to create a code generation configuration object for use withcodegen
when generating a CUDA C/C++ dynamic library.cfg = coder.gpuConfig('exe');
, to create a code generation configuration object for use withcodegen
when generating a CUDA C/C++ executable.
- To specify code generation parameters for TensorRT, set the
DeepLearningConfig
property to a coder.TensorRTConfig object that you create by using coder.DeepLearningConfig.
cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
cfg.DeepLearningConfig.DataType = 'fp32';
Specify the precision of the inference computations in supported layers by using theDataType
property. When performing inference in 32-bit floats, use'fp32'
. For half-precision, use'fp16'
. For 8-bit integer, use'int8'
. Default value is'fp32'
.INT8
precision requires a CUDA GPU with a compute capability of 6.1, 7.0, or higher.FP16
precision requires a CUDA GPU with a compute capability of 5.3, 6.0, 6.2, or higher. Use theComputeCapability
property of the GpuConfig object to set the compute capability value.
When you select the'INT8'
option, TensorRT quantizes the floating-point data toint8
. The calibration is performed with a reduced set of the calibration data. The calibration data must be present in the image data location specified byDataPath
. Preprocessing of the images must be performed before calibration and the preprocessing steps must be included in the entry-point file before code generation.
Code generation by using the NVIDIA TensorRT Library with inference computation in 8-bit integer precision supports these additional networks:- Object detector networks such as YOLOv2 and SSD.
- Regression and semantic segmentation networks. For semantic segmentation networks, the recalibration images must be in a format supported by the
imread
function.
See the Deep Learning Prediction with NVIDIA TensorRT Library example for 8-bit integer prediction for a logo classification network by using TensorRT.
- Run the
codegen
command. Thecodegen
command generates CUDA code from thegooglenet_predict.m
MATLAB entry-point function.
codegen -config cfg googlenet_predict -args {ones(224,224,3)} -report- The
-report
option instructscodegen
to generate a code generation report that you can use to debug your MATLAB code. - The
-args
option instructscodegen
to compile the filegooglenet_predict.m
by using the class, size, and complexity specified for the input in. The value(224,224,3)
corresponds to the input layer size of the GoogLeNet network. - The
-config
option instructscodegen
to use the specified configuration object for code generation.
Note
You can specify half-precision inputs for code generation. However, the code generator type casts the inputs to single-precision. The Deep Learning Toolbox uses single-precision, floating-point arithmetic for computations in MATLAB. During code generation, you can enable inference with half-precision (16-bit floating-point) inputs by specifying theDataType
property of coder.TensorRTConfig as'fp16'
.
The code generator uses column-major layout by default. To use row-major layout pass the-rowmajor
option to thecodegen
command. Alternatively, configure your code for row-major layout by modifying thecfg.RowMajor
parameter in the code generation configuration object.
- The
- When code generation is successful, you can view the resulting code generation report by clicking View Report in the MATLAB Command Window. The report is displayed in the Report Viewer window. If the code generator detects errors or warnings during code generation, the report describes the issues and provides links to the problematic MATLAB code. See Code Generation Reports.
Code generation successful: View report
Generated Code
The DAG network is generated as a C++ class containing an array of 144 layer classes. A snippet of the class declaration from googlenet_predict_types.h
file is shown.
googlenet_predict_types.h File
class b_googlenet_0 { public: void presetup(); void allocate(); void postsetup(); b_googlenet_0(); void setup(); void deallocate(); void predict(); void cleanup(); real32_T *getLayerOutput(int32_T layerIndex, int32_T portIndex); ~b_googlenet_0(); int32_T batchSize; int32_T numLayers; real32_T *getInputDataPointer(); real32_T *getOutputDataPointer(); MWCNNLayer *layers[144]; private: MWTargetNetworkImpl *targetImpl; };
- The
setup()
method of the class sets up handles and allocates memory for each layer of the network object. - The
predict()
method invokes prediction for each of the 144 layers in the network. - The
DeepLearningNetwork.cu
file contains the definitions of the object functions for theb_googlenet_0
class.
Binary files are exported for layers with parameters such as fully connected and convolution layers in the network. For instance, filescnn_googlenet_conv*_w
and cnn_googlenet_conv*_b
correspond to weights and bias parameters for the convolutional
layers in the network. The code generator places these binary files in thecodegen
folder.
By default, the generated application looks for the weight files in thecodegen
folder. If you are relocating the generated application and weight files to a different location such as an embedded board, create an environment variable called CODER_DATA_PATH
, whose value is the location of the relocated weight files. The generated application will then look for the weight files in this location.
Note
On Windows® systems, some antivirus software such as Bit Defender can incorrectly identify some weight files as infected and delete them. These cases are false positives and the files can be marked as safe in your antivirus program.
In the generated code file googlenet_predict.cu
, the entry-point function googlenet_predict()
constructs a static object of b_googlenet_0 class type and invokes setup and predict on this network object.
/* Include files */ #include "googlenet_predict.h" #include "DeepLearningNetwork.h" #include "predict.h" #include "rt_nonfinite.h"
/* Variable Definitions */ static b_googlenet_0 mynet; static boolean_T mynet_not_empty;
/* Function Definitions */ void googlenet_predict(const real_T in[150528], real32_T out[1000]) { if (!mynet_not_empty) { DeepLearningNetwork_setup(&mynet); mynet_not_empty = true; }
DeepLearningNetwork_predict(&mynet, in, out); }
void googlenet_predict_init() { mynet_not_empty = false; }
Generate Code by Using the App
To specify the entry-point function and specifying input types, complete the procedure in the app. See Generate Code by Using the GPU Coder App.
In the Generate Code step:
- Set the
Build type
toMEX
. - Click More Settings. In the Deep Learning pane, set Target library to TensorRT.
- Close the settings window. To generate CUDA code, click Generate.
Generated Makefile
For 'lib'
, 'dll'
, and 'exe'
targets, the code generator creates the *_rtw.mk
make file in thecodegen
folder. In this make file, the location of the generated code is specified by using the START_DIR
variable found in theMACROS
section. By default, this variable points to the path of the current working folder where the code is generated. If you plan to move the generated files and use the makefile to build, replace the generated value of START_DIR
with the appropriate path location.
Run the Generated MEX
- The image that you want to classify must have the same size as the input size of the network. Read the image that you want to classify and resize it to the input size of the network. This resizing slightly changes the aspect ratio of the image.
im = imread("peppers.png");
inputLayerSize = net.Layers(1).InputSize;
im = imresize(im,inputLayerSize(1:2)); - Call GoogLeNet predict on the input image.
predict_scores = googlenet_predict_mex(im); - Display the top five predicted labels and their associated probabilities as a histogram. Because the network classifies images into so many object categories, and many categories are similar, it is common to consider the top-five accuracy when evaluating networks. The network classifies the image as a bell pepper with a high probability.
[scores,indx] = sort(predict_scores, 'descend');
classNamesTop = classNames(indx(1:5));
h = figure;
h.Position(3) = 2*h.Position(3);
ax1 = subplot(1,2,1);
ax2 = subplot(1,2,2);
image(ax1,im);
barh(ax2,scores(5:-1:1))
xlabel(ax2,'Probability')
yticklabels(ax2,classNamesTop(5:-1:1))
ax2.YAxisLocation = 'right';
sgtitle('Top 5 predictions using GoogLeNet')
See Also
Functions
Objects
Related Topics
- Supported Networks, Layers, and Classes
- Load Pretrained Networks for Code Generation
- Code Generation for Deep Learning Networks by Using cuDNN
- Deep Learning Prediction with NVIDIA TensorRT Library
- Code Generation for Deep Learning Networks
- Code Generation for Object Detection by Using YOLO v2
- Deployment and Classification of Webcam Images on NVIDIA Jetson TX2 Platform