Classify Images on FPGA by Using Quantized GoogLeNet Network - MATLAB & Simulink (original) (raw)

This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image. The example uses the pretrained GoogLeNet network to demonstrate transfer learning, quantization, and deployment for the quantized network. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results.

Deploy the quantized GoogLeNet network by creating a dlhdl.Workflow object. Use the dlhdl.Workflow object to:

Generate a list of instructions, weights and biases by using the compile method.
Generate a programming file for the FPGA by using the deploy method.
Retrieve the network prediction results and performance by using the predict method.

GoogLeNet has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee mug, pencil, and many animals). The network has learned rich feature representations for a wide range of images. The network takes an image as input, and then outputs a label for the object in the image together with the probabilities for each of the object categories.

Prerequisites

Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model for GoogLeNet Network
Deep Learning HDL Toolbox™ Support Package for Intel FPGA and SoC
Image Processing Toolbox™
Intel Arria10 SoC development kit
Deep Learning Toolbox™ Model Quantization Library support package.
MATLAB Coder Interface for Deep learning Libraries

Transfer Learning Using GoogLeNet

To perform classification on a new set of images, you fine-tune a pretrained GoogLeNet convolutional neural network by transfer learning. In transfer learning, you can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. You can quickly transfer learned features to a new task using a smaller number of training images.

Load Pretrained DAG Network

Load the pretrained DAG network, GoogLeNet.

Use the analyzeNetwork function to obtain information about the network layers.

The first layer, the image input layer, requires input images of size 224-by-224-by-3, where 3 is the number of color channels.

inputSize = net.Layers(1).InputSize

inputSize = 1×3

224 224 3

Define Training and Validation Data Sets

This example uses the MathWorks MerchData data set. This is a small data set containing 75 images of MathWorks merchandise, belonging to five different classes (cap, cube, playing cards, screwdriver, and torch).

unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames');

Divide the data into training and validation data sets. Use 70% of the images for training and 30% for validation. splitEachLabel splits the images datastore into two new datastores.

[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');

This data set now contains 55 training images and 20 validation images. Display some sample images.

numTrainImages = numel(imdsTrain.Labels); idx = randperm(numTrainImages,16); figure for i = 1:16 subplot(4,4,i) I = readimage(imdsTrain,idx(i)); imshow(I) end

Replace Final Layers

The fully connected layer and classification layer of the pretrained network net are configured for 1000 classes. These two layers, loss3-classifier and output in GoogLeNet, contain information on how to combine the features that the network extracts into class probabilities, a loss value, and predicted labels. To retrain a pretrained network to classify new images, replace these two layers with new layers adapted to the new data set.

Extract the layer graph from the trained network.

lgraph = LayerGraph with properties:

     Layers: [144×1 nnet.cnn.layer.Layer]
Connections: [170×2 table]
 InputNames: {'data'}
OutputNames: {'output'}

Replace the fully connected layer with a new fully connected layer that has number of outputs equal to the number of classes. To make learning faster in the new layers than in the transferred layers, increase the WeightLearnRateFactor and BiasLearnRateFactor values of the fully connected layer.

numClasses = numel(categories(imdsTrain.Labels))

Remove 'loss3-classifier', 'prob' and 'output' layers from the lgraph.

layers = net.SortedLayers; for i = 0:2 lgraph = removeLayers(lgraph,layers(end-i).Name); end

Create three new layers and add them to the lgraph. Ensure the transferred and new layers are properly connected together in the lgraph.

newLayers = [ fullyConnectedLayer(numClasses,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20,'Name','newFC') softmaxLayer('Name','newProb') classificationLayer('Name','newClassOutput',"Classes","auto")];

lgraph = addLayers(lgraph,newLayers); lgraph = connectLayers(lgraph,layers(end-3).Name,'newFC');

Train Network

The network requires input images of size 224-by-224-by-3, but the images in the image datastores have different sizes. Use an augmented image datastore to automatically resize the training images. Specify additional augmentation operations to perform on the training images: randomly flip the training images along the vertical axis, and randomly translate them up to 30 pixels horizontally and vertically. Data augmentation helps prevent the network from over-fitting and memorizing the exact details of the training images.

pixelRange = [-30 30]; imageAugmenter = imageDataAugmenter( ... 'RandXReflection',true, ... 'RandXTranslation',pixelRange, ... 'RandYTranslation',pixelRange); augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ... 'DataAugmentation',imageAugmenter);

To automatically resize the validation images without performing further data augmentation, use an augmented image datastore without specifying any additional preprocessing operations.

augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);

Specify the training options. For transfer learning, keep the features from the early layers of the pretrained network (the transferred layer weights). To slow down learning in the transferred layers, set the initial learning rate to a small value. In the previous step, the learning rate factors were increased for the fully connected layer to speed up learning in the new final layers. This combination of learning rate settings results in fast learning only in the new layers and slower learning in the other layers. When performing transfer learning, you do not need to train for as many epochs. An epoch is a full training cycle on the entire training data set. Specify the mini-batch size to be 11. The software validates the network every ValidationFrequency iterations during training.

options = trainingOptions('sgdm', ... 'MiniBatchSize',11, ... 'MaxEpochs',5, ... 'InitialLearnRate',2e-4, ... 'Shuffle','every-epoch', ... 'ValidationData',augimdsValidation, ... 'ValidationFrequency',3, ... 'Verbose',false, ... 'Plots','training-progress');

Train the network that consists of the transferred and new layers. By default, trainNetwork uses a GPU if one is available (requires Parallel Computing Toolbox™ and a supported GPU device. Otherwise, the network uses a CPU (requires MATLAB Coder Interface for Deep learning Libraries™). You can also specify the execution environment by using the 'ExecutionEnvironment' name-value argument of trainingOptions.

netTransfer = trainNetwork(augimdsTrain,lgraph,options);

Create dlquantizer Object

Create a quantized network by using the dlquantizer object. Set the target execution environment to FPGA..

dlQuantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');

Calibrate Quantized Network

Use the calibrate function to exercise the network by using sample inputs to collect the range information. The calibrate function exercises the network and collects the dynamic ranges for the learnable parameters of the convolution and fully connected layers of the network.

For best quantization results, the calibration data must be a representative of actual inputs that are predicted by the network.

dlQuantObj.calibrate(augimdsTrain);

Set Up Intel Quartus Prime Standard

Set the synthesis tool path to point to an installed Intel® Quartus® Prime Standard Edition 20.1 executable file. You must have already installed Altera® Quartus II.

% hdlsetuptoolpath('ToolName','Altera Quartus II','ToolPath','C:\intel\20.1\quartus\bin\quartus.exe');

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet.

hTarget = dlhdl.Target('Intel','Interface','JTAG');

Generate Bitstream to Run Network

The GoogleNet network consists of multiple Cross Channel Normalization layers. To support this layer on hardware, the 'LRNBlockGeneration' property of the conv module needs to be turned on in the bitstream used for FPGA inference. The shipping arria10soc_int8 bitstream does not have 'LRNBlockGeneration' property turned on. A new bitstream can be generated using the following lines of code. The generated bitstream can be used along with a workflow object for inference.

Update the processor configuration with 'LRNBlockGeneration' property turned on and 'SegmentationBlockGeneration' property turned off. Turn off 'SegmentationBlockGeneration' to fit the Deep Learning IP on the FPGA and avoid overutilization of resources.

% hPC = dlhdl.ProcessorConfig('Bitstream', 'arria10soc_int8'); % hPC.setModuleProperty('conv', 'LRNBlockGeneration', 'on'); % hPC.setModuleProperty('conv', 'SegmentationBlockGeneration', 'off'); % dlhdl.buildProcessor(hPC)

To learn how to use the generated bitstream file, see Generate Custom Bitstream.

Create Workflow Object

Create an object of the dlhdl.Workflow class. Specify dlQuantObj as the network. Make sure to use the generated bitstream which enables processing of Cross Channel Normalization layers on FPGA. In this example, the target FPGA board is the Intel Arria10 SOC board and the generated bitstream uses the int8 data type.

hW = dlhdl.Workflow('network', dlQuantObj, 'Bitstream', 'dlprocessor.sof','Target',hTarget);

Compile Workflow Object

To compile the GoogLeNet network, run the compile function of the dlhdl.Workflow object.

Compiling network for Deep Learning FPGA prototyping ...

Targeting FPGA bitstream arria10soc_int8.

The network includes the following layers:

 1   'data'                           Image Input                   224×224×3 images with 'zerocenter' normalization                       (SW Layer)
 2   'conv1-7x7_s2'                   Convolution                   64 7×7×3 convolutions with stride [2  2] and padding [3  3  3  3]      (HW Layer)
 3   'conv1-relu_7x7'                 ReLU                          ReLU                                                                   (HW Layer)
 4   'pool1-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]            (HW Layer)
 5   'pool1-norm1'                    Cross Channel Normalization   cross channel normalization with 5 channels per element                (HW Layer)
 6   'conv2-3x3_reduce'               Convolution                   64 1×1×64 convolutions with stride [1  1] and padding [0  0  0  0]     (HW Layer)
 7   'conv2-relu_3x3_reduce'          ReLU                          ReLU                                                                   (HW Layer)
 8   'conv2-3x3'                      Convolution                   192 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
 9   'conv2-relu_3x3'                 ReLU                          ReLU                                                                   (HW Layer)
10   'conv2-norm2'                    Cross Channel Normalization   cross channel normalization with 5 channels per element                (HW Layer)
11   'pool2-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]            (HW Layer)
12   'inception_3a-1x1'               Convolution                   64 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
13   'inception_3a-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
14   'inception_3a-3x3_reduce'        Convolution                   96 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
15   'inception_3a-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
16   'inception_3a-3x3'               Convolution                   128 3×3×96 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
17   'inception_3a-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
18   'inception_3a-5x5_reduce'        Convolution                   16 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
19   'inception_3a-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
20   'inception_3a-5x5'               Convolution                   32 5×5×16 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
21   'inception_3a-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
22   'inception_3a-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
23   'inception_3a-pool_proj'         Convolution                   32 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
24   'inception_3a-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
25   'inception_3a-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
26   'inception_3b-1x1'               Convolution                   128 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
27   'inception_3b-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
28   'inception_3b-3x3_reduce'        Convolution                   128 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
29   'inception_3b-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
30   'inception_3b-3x3'               Convolution                   192 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
31   'inception_3b-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
32   'inception_3b-5x5_reduce'        Convolution                   32 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
33   'inception_3b-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
34   'inception_3b-5x5'               Convolution                   96 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
35   'inception_3b-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
36   'inception_3b-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
37   'inception_3b-pool_proj'         Convolution                   64 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
38   'inception_3b-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
39   'inception_3b-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
40   'pool3-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]            (HW Layer)
41   'inception_4a-1x1'               Convolution                   192 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
42   'inception_4a-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
43   'inception_4a-3x3_reduce'        Convolution                   96 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
44   'inception_4a-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
45   'inception_4a-3x3'               Convolution                   208 3×3×96 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
46   'inception_4a-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
47   'inception_4a-5x5_reduce'        Convolution                   16 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
48   'inception_4a-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
49   'inception_4a-5x5'               Convolution                   48 5×5×16 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
50   'inception_4a-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
51   'inception_4a-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
52   'inception_4a-pool_proj'         Convolution                   64 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
53   'inception_4a-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
54   'inception_4a-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
55   'inception_4b-1x1'               Convolution                   160 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
56   'inception_4b-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
57   'inception_4b-3x3_reduce'        Convolution                   112 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
58   'inception_4b-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
59   'inception_4b-3x3'               Convolution                   224 3×3×112 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
60   'inception_4b-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
61   'inception_4b-5x5_reduce'        Convolution                   24 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
62   'inception_4b-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
63   'inception_4b-5x5'               Convolution                   64 5×5×24 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
64   'inception_4b-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
65   'inception_4b-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
66   'inception_4b-pool_proj'         Convolution                   64 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
67   'inception_4b-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
68   'inception_4b-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
69   'inception_4c-1x1'               Convolution                   128 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
70   'inception_4c-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
71   'inception_4c-3x3_reduce'        Convolution                   128 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
72   'inception_4c-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
73   'inception_4c-3x3'               Convolution                   256 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
74   'inception_4c-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
75   'inception_4c-5x5_reduce'        Convolution                   24 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
76   'inception_4c-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
77   'inception_4c-5x5'               Convolution                   64 5×5×24 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
78   'inception_4c-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
79   'inception_4c-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
80   'inception_4c-pool_proj'         Convolution                   64 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
81   'inception_4c-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
82   'inception_4c-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
83   'inception_4d-1x1'               Convolution                   112 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
84   'inception_4d-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
85   'inception_4d-3x3_reduce'        Convolution                   144 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
86   'inception_4d-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
87   'inception_4d-3x3'               Convolution                   288 3×3×144 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
88   'inception_4d-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
89   'inception_4d-5x5_reduce'        Convolution                   32 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
90   'inception_4d-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
91   'inception_4d-5x5'               Convolution                   64 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
92   'inception_4d-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
93   'inception_4d-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
94   'inception_4d-pool_proj'         Convolution                   64 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
95   'inception_4d-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
96   'inception_4d-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
97   'inception_4e-1x1'               Convolution                   256 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
98   'inception_4e-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
99   'inception_4e-3x3_reduce'        Convolution                   160 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
100   'inception_4e-relu_3x3_reduce'   ReLU                          ReLU                                                                  (HW Layer)
101   'inception_4e-3x3'               Convolution                   320 3×3×160 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
102   'inception_4e-relu_3x3'          ReLU                          ReLU                                                                  (HW Layer)
103   'inception_4e-5x5_reduce'        Convolution                   32 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
104   'inception_4e-relu_5x5_reduce'   ReLU                          ReLU                                                                  (HW Layer)
105   'inception_4e-5x5'               Convolution                   128 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]   (HW Layer)
106   'inception_4e-relu_5x5'          ReLU                          ReLU                                                                  (HW Layer)
107   'inception_4e-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]           (HW Layer)
108   'inception_4e-pool_proj'         Convolution                   128 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
109   'inception_4e-relu_pool_proj'    ReLU                          ReLU                                                                  (HW Layer)
110   'inception_4e-output'            Depth concatenation           Depth concatenation of 4 inputs                                       (HW Layer)
111   'pool4-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]           (HW Layer)
112   'inception_5a-1x1'               Convolution                   256 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
113   'inception_5a-relu_1x1'          ReLU                          ReLU                                                                  (HW Layer)
114   'inception_5a-3x3_reduce'        Convolution                   160 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
115   'inception_5a-relu_3x3_reduce'   ReLU                          ReLU                                                                  (HW Layer)
116   'inception_5a-3x3'               Convolution                   320 3×3×160 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
117   'inception_5a-relu_3x3'          ReLU                          ReLU                                                                  (HW Layer)
118   'inception_5a-5x5_reduce'        Convolution                   32 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
119   'inception_5a-relu_5x5_reduce'   ReLU                          ReLU                                                                  (HW Layer)
120   'inception_5a-5x5'               Convolution                   128 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]   (HW Layer)
121   'inception_5a-relu_5x5'          ReLU                          ReLU                                                                  (HW Layer)
122   'inception_5a-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]           (HW Layer)
123   'inception_5a-pool_proj'         Convolution                   128 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
124   'inception_5a-relu_pool_proj'    ReLU                          ReLU                                                                  (HW Layer)
125   'inception_5a-output'            Depth concatenation           Depth concatenation of 4 inputs                                       (HW Layer)
126   'inception_5b-1x1'               Convolution                   384 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
127   'inception_5b-relu_1x1'          ReLU                          ReLU                                                                  (HW Layer)
128   'inception_5b-3x3_reduce'        Convolution                   192 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
129   'inception_5b-relu_3x3_reduce'   ReLU                          ReLU                                                                  (HW Layer)
130   'inception_5b-3x3'               Convolution                   384 3×3×192 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
131   'inception_5b-relu_3x3'          ReLU                          ReLU                                                                  (HW Layer)
132   'inception_5b-5x5_reduce'        Convolution                   48 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
133   'inception_5b-relu_5x5_reduce'   ReLU                          ReLU                                                                  (HW Layer)
134   'inception_5b-5x5'               Convolution                   128 5×5×48 convolutions with stride [1  1] and padding [2  2  2  2]   (HW Layer)
135   'inception_5b-relu_5x5'          ReLU                          ReLU                                                                  (HW Layer)
136   'inception_5b-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]           (HW Layer)
137   'inception_5b-pool_proj'         Convolution                   128 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
138   'inception_5b-relu_pool_proj'    ReLU                          ReLU                                                                  (HW Layer)
139   'inception_5b-output'            Depth concatenation           Depth concatenation of 4 inputs                                       (HW Layer)
140   'pool5-7x7_s1'                   2-D Global Average Pooling    2-D global average pooling                                            (HW Layer)
141   'pool5-drop_7x7_s1'              Dropout                       40% dropout                                                           (HW Layer)
142   'newFC'                          Fully Connected               5 fully connected layer                                               (HW Layer)
143   'newProb'                        Softmax                       softmax                                                               (HW Layer)
144   'newClassOutput'                 Classification Output         crossentropyex with 'MathWorks Cap' and 4 other classes               (SW Layer)

Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.

Notice: The layer 'newClassOutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.

Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ...

Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ... complete.

Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ...

Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ... complete.

Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ...

Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ... complete.

Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ...

Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ... complete.

Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ...

Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ... complete.

Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ...

Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ... complete.

Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ...

Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ... complete.

Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ...

Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ... complete.

Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ...

Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ... complete.

Compiling layer group: pool3-3x3_s2 ...

Compiling layer group: pool3-3x3_s2 ... complete.

Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ...

Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ... complete.

Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ...

Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ... complete.

Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ...

Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ... complete.

Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ...

Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ... complete.

Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ...

Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ... complete.

Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ...

Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ... complete.

Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ...

Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ... complete.

Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ...

Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ... complete.

Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ...

Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ... complete.

Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ...

Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ... complete.

Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ...

Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ... complete.

Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ...

Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ... complete.

Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ...

Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ... complete.

Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ...

Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ... complete.

Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ...

Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ... complete.

Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ...

Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ... complete.

Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ...

Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ... complete.

Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ...

Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ... complete.

Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ...

Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ... complete.

Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ...

Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ... complete.

Compiling layer group: pool4-3x3_s2 ...

Compiling layer group: pool4-3x3_s2 ... complete.

Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ...

Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ... complete.

Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ...

Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ... complete.

Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ...

Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ... complete.

Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ...

Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ... complete.

Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ...

Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ... complete.

Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ...

Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ... complete.

Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ...

Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ... complete.

Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ...

Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ... complete.

Compiling layer group: pool5-7x7_s1 ...

Compiling layer group: pool5-7x7_s1 ... complete.

Compiling layer group: newFC ...

Compiling layer group: newFC ... complete.

Allocating external memory buffers:

      offset_name          offset_address    allocated_space 
_______________________    ______________    ________________

"InputDataOffset"           "0x00000000"     "12.0 MB"       
"OutputResultOffset"        "0x00c00000"     "4.0 MB"        
"SchedulerDataOffset"       "0x01000000"     "4.0 MB"        
"SystemBufferOffset"        "0x01400000"     "28.0 MB"       
"InstructionDataOffset"     "0x03000000"     "8.0 MB"        
"ConvWeightDataOffset"      "0x03800000"     "32.0 MB"       
"FCWeightDataOffset"        "0x05800000"     "4.0 MB"        
"EndOffset"                 "0x05c00000"     "Total: 92.0 MB"

Network compilation complete.

dn = struct with fields: weights: [1×1 struct] instructions: [1×1 struct] registers: [1×1 struct] syncInstructions: [1×1 struct]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Intel Arria10 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. The function also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

Programming FPGA Bitstream using JTAG...

Programming the FPGA bitstream has been completed successfully.

Loading weights to Conv Processor.

Conv Weights loaded. Current time is 11-Jun-2021 22:20:12

Loading weights to FC Processor.

FC Weights loaded. Current time is 11-Jun-2021 22:20:12

Load Example Image

I = imresize(readimage(imdsValidation,1),[224 224]); figure imshow(I)

Retrieve Image Prediction

Execute the predict function of the dlhdl.Workflow object and display the prediction results.

[prediction, speed] = hW.predict(single(I),'Profile','off');

Finished writing input activations.

Running single input activation.

[val, index] = max(prediction); label = netTransfer.Layers(end).ClassNames{index}

Retrieve Deployed Network Performance

View the performance of the deployed network by using the predict method with the Profile argument set to on.

[~, speed] = hW.predict(single(I),'Profile','on')

Finished writing input activations.

Running single input activation.

          Deep Learning Processor Profiler Performance Results

               LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                     -------------             -------------              ---------        ---------       ---------

Network 15836394 0.10558 1 15845325 9.5 conv1-7x7_s2 1139964 0.00760 pool1-3x3_s2 268928 0.00179 pool1-norm1 310985 0.00207 conv2-3x3_reduce 278740 0.00186 conv2-3x3 823735 0.00549 conv2-norm2 952105 0.00635 pool2-3x3_s2 273479 0.00182 inception_3a-1x1 198078 0.00132 inception_3a-3x3_reduce 280845 0.00187 inception_3a-3x3 196410 0.00131 inception_3a-5x5_reduce 73846 0.00049 inception_3a-5x5 35295 0.00024 inception_3a-pool 94554 0.00063 inception_3a-pool_proj 115223 0.00077 inception_3b-1x1 619945 0.00413 inception_3b-3x3_reduce 620509 0.00414 inception_3b-3x3 367297 0.00245 inception_3b-5x5_reduce 207909 0.00139 inception_3b-5x5 178552 0.00119 inception_3b-pool 179959 0.00120 inception_3b-pool_proj 344959 0.00230 pool3-3x3_s2 293640 0.00196 inception_4a-1x1 332992 0.00222 inception_4a-3x3_reduce 181829 0.00121 inception_4a-3x3 83777 0.00056 inception_4a-5x5_reduce 55639 0.00037 inception_4a-5x5 14500 0.00010 inception_4a-pool 77187 0.00051 inception_4a-pool_proj 130965 0.00087 inception_4b-1x1 300254 0.00200 inception_4b-3x3_reduce 220515 0.00147 inception_4b-3x3 101764 0.00068 inception_4b-5x5_reduce 73096 0.00049 inception_4b-5x5 25720 0.00017 inception_4b-pool 82277 0.00055 inception_4b-pool_proj 139530 0.00093 inception_4c-1x1 246715 0.00164 inception_4c-3x3_reduce 246987 0.00165 inception_4c-3x3 129291 0.00086 inception_4c-5x5_reduce 72855 0.00049 inception_4c-5x5 25444 0.00017 inception_4c-pool 82661 0.00055 inception_4c-pool_proj 139761 0.00093 inception_4d-1x1 220154 0.00147 inception_4d-3x3_reduce 273136 0.00182 inception_4d-3x3 159811 0.00107 inception_4d-5x5_reduce 86719 0.00058 inception_4d-5x5 32485 0.00022 inception_4d-pool 82309 0.00055 inception_4d-pool_proj 139464 0.00093 inception_4e-1x1 474515 0.00316 inception_4e-3x3_reduce 309661 0.00206 inception_4e-3x3 193442 0.00129 inception_4e-5x5_reduce 88661 0.00059 inception_4e-5x5 62881 0.00042 inception_4e-pool 85098 0.00057 inception_4e-pool_proj 254234 0.00169 pool4-3x3_s2 164072 0.00109 inception_5a-1x1 385821 0.00257 inception_5a-3x3_reduce 250827 0.00167 inception_5a-3x3 99439 0.00066 inception_5a-5x5_reduce 69697 0.00046 inception_5a-5x5 32465 0.00022 inception_5a-pool 53624 0.00036 inception_5a-pool_proj 205084 0.00137 inception_5b-1x1 567107 0.00378 inception_5b-3x3_reduce 295819 0.00197 inception_5b-3x3 139308 0.00093 inception_5b-5x5_reduce 92415 0.00062 inception_5b-5x5 46311 0.00031 inception_5b-pool 53882 0.00036 inception_5b-pool_proj 205632 0.00137 pool5-7x7_s1 69837 0.00047 newFC 23215 0.00015

The clock frequency of the DL processor is: 150MHz

speed=75×5 table Latency(cycles) Latency(seconds) NumFrames Total Latency(cycles) Frame/s _______________ ________________ _________ _____________________ ________

Network                          1.5836e+07             0.10558          "1"            "15845325"          "9.4665"
____conv1-7x7_s2                   1.14e+06           0.0075998          ""             ""                  ""      
____pool1-3x3_s2                 2.6893e+05           0.0017929          ""             ""                  ""      
____pool1-norm1                  3.1098e+05           0.0020732          ""             ""                  ""      
____conv2-3x3_reduce             2.7874e+05           0.0018583          ""             ""                  ""      
____conv2-3x3                    8.2374e+05           0.0054916          ""             ""                  ""      
____conv2-norm2                   9.521e+05           0.0063474          ""             ""                  ""      
____pool2-3x3_s2                 2.7348e+05           0.0018232          ""             ""                  ""      
____inception_3a-1x1             1.9808e+05           0.0013205          ""             ""                  ""      
____inception_3a-3x3_reduce      2.8084e+05           0.0018723          ""             ""                  ""      
____inception_3a-3x3             1.9641e+05           0.0013094          ""             ""                  ""      
____inception_3a-5x5_reduce           73846          0.00049231          ""             ""                  ""      
____inception_3a-5x5                  35295           0.0002353          ""             ""                  ""      
____inception_3a-pool                 94554          0.00063036          ""             ""                  ""      
____inception_3a-pool_proj       1.1522e+05          0.00076815          ""             ""                  ""      
____inception_3b-1x1             6.1994e+05            0.004133          ""             ""                  ""      
  ⋮

The speed table contains the latency information for every layer, total network latency, and the overall network performance in frames per second (FPS). For more information, see Profile Inference Run.