dlquantizationOptions - Options for quantizing a trained deep neural network - MATLAB (original) (raw)
Options for quantizing a trained deep neural network
Since R2020a
Creation
Syntax
Description
`quantOpts` = dlquantizationOptions
creates adlquantizationOptions
object with default property values.
`quantOpts` = dlquantizationOptions(`Name,Value`)
creates a dlquantizationOptions
object with additional properties specified by one or more name-value pair arguments.
Properties
MetricFcn
— Metric function to use for validation of quantized network
cell array of function handles
Metric function to use for validation of quantized network, specified as a cell array of one or more function handles.
Example: options = dlquantizationOptions('MetricFcn',{@(x)hComputeModelAccuracy(x,net,groundTruth)});
Data Types: cell
Execution Environment Options
Bitstream
— Name of FPGA bitstream
'zcu102_int8'
| 'zc706_int8'
| 'arria10soc_int8'
This property is valid only when the'ExecutionEnvironment
property of thedlquantizer
object is set to'FPGA'
.
Name of the FPGA bitstream, specified as one of these values or as the path to a custom bitstream:
Bitstream | Target Board |
---|---|
'zcu102_int8' | Xilinx® Zynq® UltraScale™ ZCU102 |
'zc706_int8' | Xilinx Zynq-7000 ZC706 |
'arria10soc_int8' | Intel® Arria® 10 SoC development kit |
You can specify a custom bitstream for any supported target board or for a custom target. To use a custom bitstream, first specify a dlhdl.Target
object, then specify the path to a valid bitstream file ending in.sof
or .bit
depending on your target board. For more information about generating custom bitstreams, see Generate Custom Bitstream (Deep Learning HDL Toolbox)
Example: quantOpts = dlquantizationOptions('Bitstream','zcu102_int8')
Example: hTarget = dlhdl.Target('Intel','Interface','JTAG'); quantOpts = dlquantizationOptions('Target',hTarget,'Bitstream','C:\yourFolder\customBitstream_int8.bit')
Target
— Target for quantized network
'host'
(default) | 'gpu'
| dlhdl.Target
object | raspi
object
Target for quantized network, specified as one of the following:
Target | Execution Environment for Quantized Network | Example |
---|---|---|
Quantized network in MATLAB specified with'host' | Set Target property as 'host' when'ExecutionEnvironment' property of thedlquantizer object is set to'GPU','FPGA', or'MATLAB' | quantOpts = dlquantizationOptions('Target','host') |
Target GPU device specified with 'gpu' | Set Target property as 'gpu' only when 'ExecutionEnvironment' property of thedlquantizer object is set to'GPU' | quantOpts = dlquantizationOptions('Target','gpu') |
Target CPU board specified as a raspi object | Set Target property as a raspi object only when 'ExecutionEnvironment' property of thedlquantizer object is set to'CPU' | r = raspi('hostname','User Name','Password'); quantOpts = dlquantizationOptions('Target',r) |
Target FPGA board vendor name and interface, specified as a dlhdl.Target (Deep Learning HDL Toolbox) object | Set Target property as adlhdl.Target object only when'ExecutionEnvironment' property of thedlquantizer object is set to'FPGA' | hTarget = dlhdl.Target('Intel','Interface','JTAG'); quantOpts = dlquantizationOptions('Target',hTarget) |
Examples
Quantize a Neural Network for GPU Target
This example shows how to quantize learnable parameters in the convolution layers of a neural network for GPU and explore the behavior of the quantized network. In this example, you quantize the squeezenet neural network after retraining the network to classify new images. In this example, the memory required for the network is reduced approximately 75% through quantization while the accuracy of the network is not affected.
Load the pretrained network. net
is the output network of the Train Deep Learning Network to Classify New Images example.
load squeezedlnetmerch net
net = dlnetwork with properties:
Layers: [67×1 nnet.cnn.layer.Layer]
Connections: [74×2 table]
Learnables: [52×3 table]
State: [0×3 table]
InputNames: {'data'}
OutputNames: {'prob'}
Initialized: 1
View summary with summary.
Define calibration and validation data to use for quantization.
The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.
The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.
In this example, use the images in the MerchData
data set. Define an augmentedImageDatastore
object to resize the data for the network. Then, split the data into calibration and validation data sets.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); classes = categories(imds.Labels); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227], calData); aug_valData = augmentedImageDatastore([227 227], valData);
Create a dlquantizer
object and specify the network to quantize.
dlquantObj = dlquantizer(net);
Specify the GPU target.
quantOpts = dlquantizationOptions(Target='gpu'); quantOpts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}
quantOpts = dlquantizationOptions with properties:
Validation Metric Info MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]}
Validation Environment Info Target: 'gpu' Bitstream: ''
Use the calibrate
function to exercise the network with sample inputs and collect range information. The calibrate
function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.
calResults = calibrate(dlquantObj, aug_calData)
calResults=120×5 table Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue ____________________________ ____________________ ________________________ _________ ________
{'conv1_Weights' } {'conv1' } "Weights" -0.91985 0.88489
{'conv1_Bias' } {'conv1' } "Bias" -0.07925 0.26343
{'fire2-squeeze1x1_Weights'} {'fire2-squeeze1x1'} "Weights" -1.38 1.2477
{'fire2-squeeze1x1_Bias' } {'fire2-squeeze1x1'} "Bias" -0.11641 0.24273
{'fire2-expand1x1_Weights' } {'fire2-expand1x1' } "Weights" -0.7406 0.90982
{'fire2-expand1x1_Bias' } {'fire2-expand1x1' } "Bias" -0.060056 0.14602
{'fire2-expand3x3_Weights' } {'fire2-expand3x3' } "Weights" -0.74397 0.66905
{'fire2-expand3x3_Bias' } {'fire2-expand3x3' } "Bias" -0.051778 0.074239
{'fire3-squeeze1x1_Weights'} {'fire3-squeeze1x1'} "Weights" -0.7712 0.68917
{'fire3-squeeze1x1_Bias' } {'fire3-squeeze1x1'} "Bias" -0.10138 0.32675
{'fire3-expand1x1_Weights' } {'fire3-expand1x1' } "Weights" -0.72035 0.9743
{'fire3-expand1x1_Bias' } {'fire3-expand1x1' } "Bias" -0.067029 0.30425
{'fire3-expand3x3_Weights' } {'fire3-expand3x3' } "Weights" -0.61443 0.7741
{'fire3-expand3x3_Bias' } {'fire3-expand3x3' } "Bias" -0.053613 0.10329
{'fire4-squeeze1x1_Weights'} {'fire4-squeeze1x1'} "Weights" -0.7422 1.0877
{'fire4-squeeze1x1_Bias' } {'fire4-squeeze1x1'} "Bias" -0.10885 0.13881
⋮
Use the validate
function to quantize the learnable parameters in the convolution layers of the network and exercise the network. The function uses the metric function defined in the dlquantizationOptions
object to compare the results of the network before and after quantization.
valResults = validate(dlquantObj, aug_valData, quantOpts)
valResults = struct with fields: NumSamples: 20 MetricResults: [1×1 struct] Statistics: [2×2 table]
Examine the validation output to see the performance of the quantized network.
valResults.MetricResults.Result
ans=2×2 table NetworkImplementation MetricOutput _____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
ans=2×2 table NetworkImplementation LearnableParameterMemory(bytes) _____________________ _______________________________
{'Floating-Point'} 2.9003e+06
{'Quantized' } 7.3393e+05
In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.
The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.
Quantize Network for FPGA Deployment
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Quantization Library and Deep Learning HDL Toolbox to deploy the int8
network to a target FPGA board.
For this example, you need:
- Deep Learning Toolbox™
- Deep Learning HDL Toolbox™
- Deep Learning Toolbox Model Quantization Library
- Deep Learning HDL Toolbox Support Package for Xilinx® FPGA and SoC Devices
- MATLAB® Coder™ Interface for Deep Learning.
Load Pretrained Network
Load the pretrained LogoNet network and analyze the network architecture.
snet = getLogoNetwork; deepNetworkDesigner(snet);
Set random number generator for reproducibility.
Load Data
This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore object for calibration and validation.
curDir = pwd; unzip("logos_dataset.zip"); imageData = imageDatastore(fullfile(curDir,'logos_dataset'),... 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames'); [calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');
**Generate Calibration Result File for the Network
Create a dlquantizer (Deep Learning HDL Toolbox) object and specify the network to quantize. Specify the execution environment as FPGA.
dlQuantObj = dlquantizer(snet,'ExecutionEnvironment',"FPGA");
Use the calibrate (Deep Learning HDL Toolbox) function to exercise the network with sample inputs and collect the range information. The calibrate
function collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.
calibrate(dlQuantObj,calibrationData)
ans=35×5 table Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue ____________________________ __________________ ________________________ ___________ __________
{'conv_1_Weights' } {'conv_1' } "Weights" -0.048978 0.039352
{'conv_1_Bias' } {'conv_1' } "Bias" 0.99996 1.0028
{'conv_2_Weights' } {'conv_2' } "Weights" -0.055518 0.061901
{'conv_2_Bias' } {'conv_2' } "Bias" -0.00061171 0.00227
{'conv_3_Weights' } {'conv_3' } "Weights" -0.045942 0.046927
{'conv_3_Bias' } {'conv_3' } "Bias" -0.0013998 0.0015218
{'conv_4_Weights' } {'conv_4' } "Weights" -0.045967 0.051
{'conv_4_Bias' } {'conv_4' } "Bias" -0.00164 0.0037892
{'fc_1_Weights' } {'fc_1' } "Weights" -0.051394 0.054344
{'fc_1_Bias' } {'fc_1' } "Bias" -0.00052319 0.00084454
{'fc_2_Weights' } {'fc_2' } "Weights" -0.05016 0.051557
{'fc_2_Bias' } {'fc_2' } "Bias" -0.0017564 0.0018502
{'fc_3_Weights' } {'fc_3' } "Weights" -0.050706 0.04678
{'fc_3_Bias' } {'fc_3' } "Bias" -0.02951 0.024855
{'imageinput' } {'imageinput'} "Activations" 0 255
{'imageinput_normalization'} {'imageinput'} "Activations" -139.34 198.72
⋮
Create Target Object
Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx Vivado® Design Suite 2022.1. To set the Xilinx Vivado toolpath, enter:
hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2022.1\bin\vivado.bat');
To create the target object, enter:
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet','IPAddress','10.10.10.15');
Alternatively, you can also use the JTAG interface.
% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');
**Create dlQuantizationOptions
Object
Create a dlquantizationOptions object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.
options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget); options_emulation = dlquantizationOptions('Target','host');
To use a custom metric function, specify the metric function in the dlquantizationOptions
object.
options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)},'Bitstream','zcu102_int8','Target',hTarget); options_emulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)})
**Validate Quantized Network
Use the validate function to quantize the learnable parameters in the convolution layers of the network. The validate
function simulates the quantized network in MATLAB. The validate
function uses the metric function defined in the dlquantizationOptions
object to compare the results of the single-data-type network object to the results of the quantized network object.
prediction_emulation = dlQuantObj.validate(validationData,options_emulation)
prediction_emulation = struct with fields: NumSamples: 160 MetricResults: [1×1 struct] Statistics: []
For validation on an FPGA, the validate function:
- Programs the FPGA board by using the output of the
compile
method and the programming file - Downloads the network weights and biases
- Compares the performance of the network before and after quantization
prediction_FPGA = dlQuantObj.validate(validationData,options_FPGA)
Compiling network for Deep Learning FPGA prototyping ...
Targeting FPGA bitstream zcu102_int8.
The network includes the following layers:
1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer)
2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
3 'relu_1' ReLU ReLU (HW Layer)
4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
6 'relu_2' ReLU ReLU (HW Layer)
7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
9 'relu_3' ReLU ReLU (HW Layer)
10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer)
12 'relu_4' ReLU ReLU (HW Layer)
13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer)
15 'relu_5' ReLU ReLU (HW Layer)
16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer)
17 'relu_6' ReLU ReLU (HW Layer)
18 'fc_3' Fully Connected 32 fully connected layer (HW Layer)
19 'softmax' Softmax softmax (SW Layer)
20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer)
Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
Compiling layer group: conv_1>>relu_4 ...
Compiling layer group: conv_1>>relu_4 ... complete.
Compiling layer group: maxpool_4 ...
Compiling layer group: maxpool_4 ... complete.
Compiling layer group: fc_1>>fc_3 ...
Compiling layer group: fc_1>>fc_3 ... complete.
Allocating external memory buffers:
offset_name offset_address allocated_space
_______________________ ______________ ________________
"InputDataOffset" "0x00000000" "11.9 MB"
"OutputResultOffset" "0x00be0000" "128.0 kB"
"SchedulerDataOffset" "0x00c00000" "128.0 kB"
"SystemBufferOffset" "0x00c20000" "9.9 MB"
"InstructionDataOffset" "0x01600000" "4.6 MB"
"ConvWeightDataOffset" "0x01aa0000" "8.2 MB"
"FCWeightDataOffset" "0x022e0000" "10.4 MB"
"EndOffset" "0x02d40000" "Total: 45.2 MB"
Network compilation complete.
FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Finished writing input activations.
Running single input activation.
Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
The network includes the following layers:
1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer)
2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
3 'relu_1' ReLU ReLU (HW Layer)
4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
6 'relu_2' ReLU ReLU (HW Layer)
7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
9 'relu_3' ReLU ReLU (HW Layer)
10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer)
12 'relu_4' ReLU ReLU (HW Layer)
13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer)
15 'relu_5' ReLU ReLU (HW Layer)
16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer)
17 'relu_6' ReLU ReLU (HW Layer)
18 'fc_3' Fully Connected 32 fully connected layer (HW Layer)
19 'softmax' Softmax softmax (SW Layer)
20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer)
Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
Deep Learning Processor Estimator Performance Results
LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s
------------- ------------- --------- --------- ---------
Network 39136574 0.17789 1 39136574 5.6 imageinput_norm 216472 0.00098 conv_1 6832680 0.03106 maxpool_1 3705912 0.01685 conv_2 10454501 0.04752 maxpool_2 1173810 0.00534 conv_3 9364533 0.04257 maxpool_3 1229970 0.00559 conv_4 1759348 0.00800 maxpool_4 24450 0.00011 fc_1 2651288 0.01205 fc_2 1696632 0.00771 fc_3 26978 0.00012
- The clock frequency of the DL processor is: 220MHz
Finished writing input activations.
Running single input activation.
prediction_FPGA = struct with fields: NumSamples: 160 MetricResults: [1×1 struct] Statistics: [2×7 table]
**View Performance of Quantized Neural Network
Display the accuracy of the quantized network.
prediction_emulation.MetricResults.Result
ans=2×2 table NetworkImplementation MetricOutput _____________________ ____________
{'Floating-Point'} 0.9875
{'Quantized' } 0.9875
prediction_FPGA.MetricResults.Result
ans=2×2 table NetworkImplementation MetricOutput _____________________ ____________
{'Floating-Point'} 0.9875
{'Quantized' } 0.9875
Display the performance of the quantized network in frames per second.
prediction_FPGA.Statistics
ans=2×7 table NetworkImplementation FramesPerSecond Number of Threads (Convolution) Number of Threads (Fully Connected) LUT Utilization (%) BlockRAM Utilization (%) DSP Utilization (%) _____________________ _______________ _______________________________ ___________________________________ ___________________ ________________________ ___________________
{'Floating-Point'} 5.6213 16 4 93.198 63.925 15.595
{'Quantized' } 19.433 64 16 62.31 50.11 32.103
Quantize a Neural Network for CPU Target
This example shows how to quantize and validate a neural network for a CPU target. This workflow is similar to other execution environments, but before validating you must establish a raspi
connection and specify it as target using dlquantizationOptions
.
First, load your network. This example uses the pretrained network squeezenet
.
load squeezedlnetmerch net
net = dlnetwork with properties:
Layers: [67×1 nnet.cnn.layer.Layer]
Connections: [74×2 table]
Learnables: [52×3 table]
State: [0×3 table]
InputNames: {'data'}
OutputNames: {'prob'}
Initialized: 1
View summary with summary.
Then define your calibration and validation data, calDS
and valDS
respectively.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); classes = categories(imds.Labels); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227],calData); aug_valData = augmentedImageDatastore([227 227],valData);
Create the dlquantizer
object and specify a CPU execution environment.
dq = dlquantizer(net,'ExecutionEnvironment','CPU')
dq = dlquantizer with properties:
NetworkObject: [1×1 dlnetwork]
ExecutionEnvironment: 'CPU'
Calibrate the network.
calResults = calibrate(dq,aug_calData,'UseGPU','off')
calResults=120×5 table Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue __________________________ ____________________ ________________________ _________ ________
"conv1_Weights" {'conv1' } "Weights" -0.91985 0.88489
"conv1_Bias" {'conv1' } "Bias" -0.07925 0.26343
"fire2-squeeze1x1_Weights" {'fire2-squeeze1x1'} "Weights" -1.38 1.2477
"fire2-squeeze1x1_Bias" {'fire2-squeeze1x1'} "Bias" -0.11641 0.24273
"fire2-expand1x1_Weights" {'fire2-expand1x1' } "Weights" -0.7406 0.90982
"fire2-expand1x1_Bias" {'fire2-expand1x1' } "Bias" -0.060056 0.14602
"fire2-expand3x3_Weights" {'fire2-expand3x3' } "Weights" -0.74397 0.66905
"fire2-expand3x3_Bias" {'fire2-expand3x3' } "Bias" -0.051778 0.074239
"fire3-squeeze1x1_Weights" {'fire3-squeeze1x1'} "Weights" -0.7712 0.68917
"fire3-squeeze1x1_Bias" {'fire3-squeeze1x1'} "Bias" -0.10138 0.32675
"fire3-expand1x1_Weights" {'fire3-expand1x1' } "Weights" -0.72035 0.9743
"fire3-expand1x1_Bias" {'fire3-expand1x1' } "Bias" -0.067029 0.30425
"fire3-expand3x3_Weights" {'fire3-expand3x3' } "Weights" -0.61443 0.7741
"fire3-expand3x3_Bias" {'fire3-expand3x3' } "Bias" -0.053613 0.10329
"fire4-squeeze1x1_Weights" {'fire4-squeeze1x1'} "Weights" -0.7422 1.0877
"fire4-squeeze1x1_Bias" {'fire4-squeeze1x1'} "Bias" -0.10885 0.13881
⋮
Use the MATLAB Support Package for Raspberry Pi Hardware function, raspi
, to create a connection to the Raspberry Pi. In the following code, replace:
raspiname
with the name or address of your Raspberry Piusername
with your user namepassword
with your password
% r = raspi('raspiname','username','password')
For example,
r = raspi('gpucoder-raspberrypi-8','pi','matlab')
r = raspi with properties:
DeviceAddress: 'gpucoder-raspberrypi-8'
Port: 18734
BoardName: 'Raspberry Pi 3 Model B+'
AvailableLEDs: {'led0'}
AvailableDigitalPins: [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]
AvailableSPIChannels: {}
AvailableI2CBuses: {}
AvailableWebcams: {}
I2CBusSpeed:
AvailableCANInterfaces: {}
Supported peripherals
Specify raspi
object as the target for the quantized network.
opts = dlquantizationOptions('Target',r); opts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}
opts = dlquantizationOptions with properties:
Validation Metric Info MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]}
Validation Environment Info Target: [1×1 raspi] Bitstream: ''
Validate the quantized network with the validate
function.
valResults = validate(dq,aug_valData,opts)
Starting application: 'codegen/lib/validate_predict_int8/pil/validate_predict_int8.elf'
To terminate execution: clear validate_predict_int8_pil
Launching application validate_predict_int8.elf...
Host application produced the following standard output (stdout) and standard error (stderr) messages:
valResults = struct with fields: NumSamples: 20 MetricResults: [1×1 struct] Statistics: []
Examine the validation output to see the performance of the quantized network.
valResults.MetricResults.Result
ans=2×2 table NetworkImplementation MetricOutput _____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
Version History
Introduced in R2020a
R2023a: Specify Raspberry Pi as quantization target
You can now specify a raspi
object as the target for quantization using the Target
property when dlquantizer Execution Environment
is set to CPU
.