Quantization, Projection, and Pruning - MATLAB & Simulink (original) (raw)
Main Content
Compress a deep neural network by performing quantization, projection, or pruning
Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint and computational requirements of a deep neural network by:
- Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA® code from this pruned network.
- Projecting layers by performing principal component analysis (PCA) on the layer activations using a data set representative of the training data and applying linear projections on the layer learnable parameters. Forward passes of a projected deep neural network are typically faster when you deploy the network to embedded hardware using library-free C/C++ code generation.
- Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA, or HDL code from this quantized network for GPU, FPGA, or CPU deployment.
For a detailed overview of the compression techniques available in Deep Learning Toolbox Model Quantization Library, see Reduce Memory Footprint of Deep Neural Networks.
Functions
Pruning
taylorPrunableNetwork | Neural network suitable for compression using Taylor pruning (Since R2022a) |
---|---|
forward | Compute deep learning network output for training |
predict | Compute deep learning network output for inference |
updatePrunables | Remove filters from prunable layers based on importance scores (Since R2022a) |
updateScore | Compute and accumulate Taylor-based importance scores for pruning (Since R2022a) |
dlnetwork | Deep learning neural network |
Projection
Quantization
dlquantizer | Quantize a deep neural network to 8-bit scaled integer data types (Since R2020a) |
---|---|
dlquantizationOptions | Options for quantizing a trained deep neural network (Since R2020a) |
prepareNetwork | Prepare deep neural network for quantization (Since R2024b) |
calibrate | Simulate and collect ranges of a deep neural network (Since R2020a) |
quantize | Quantize deep neural network (Since R2022a) |
validate | Quantize and validate a deep neural network (Since R2020a) |
quantizationDetails | Display quantization details for a neural network (Since R2022a) |
estimateNetworkMetrics | Estimate network metrics for specific layers of a neural network (Since R2022a) |
equalizeLayers | Equalize layer parameters of deep neural network (Since R2022b) |
exportNetworkToSimulink | Generate Simulink model that contains deep learning layer blocks that correspond to deep learning layer objects (Since R2024b) |
Apps
Topics
Overview
- Reduce Memory Footprint of Deep Neural Networks
Learn about neural network compression techniques, including pruning, projection, and quantization.
Pruning
- Analyze and Compress 1-D Convolutional Neural Network
Analyze 1-D convolutional network for compression and compress it using Taylor pruning and projection. (Since R2024b) - Parameter Pruning and Quantization of Image Classification Network
Use parameter pruning and quantization to reduce network size. - Prune Image Classification Network Using Taylor Scores
This example shows how to reduce the size of a deep neural network using Taylor pruning. - Prune Filters in a Detection Network Using Taylor Scores
This example shows how to reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network. - Prune and Quantize Convolutional Neural Network for Speech Recognition
Compress a convolutional neural network (CNN) to prepare it for deployment on an embedded system.
Projection and Knowledge Distillation
- Compress Neural Network Using Projection
This example shows how to compress a neural network using projection and principal component analysis. - Evaluate Code Generation Inference Time of Compressed Deep Neural Network
This example shows how to compare the inference time of a compressed deep neural network for battery state of charge estimation. (Since R2023b) - Train Smaller Neural Network Using Knowledge Distillation
This example shows how to reduce the memory footprint of a deep learning network by using knowledge distillation. (Since R2023b)
Quantization
- Quantization of Deep Neural Networks
Understand effects of quantization and how to visualize dynamic ranges of network convolution layers. - Quantization Workflow Prerequisites
Products required for the quantization of deep learning networks. - Prepare Data for Quantizing Networks
Supported datastores for quantization workflows. - Quantize Multiple-Input Network Using Image and Feature Data
Quantize Multiple Input Network Using Image and Feature Data - Export Quantized Networks to Simulink and Generate Code
Export a quantized neural network to Simulink and generate code from the exported model.
Quantization for GPU Target
- Generate INT8 Code for Deep Learning Networks (GPU Coder)
Quantize and generate code for a pretrained convolutional neural network. - Quantize Residual Network Trained for Image Classification and Generate CUDA Code
This example shows how to quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data. - Quantize Layers in Object Detectors and Generate CUDA Code
This example shows how to generate CUDA® code for an SSD vehicle detector and a YOLO v2 vehicle detector that performs inference computations in 8-bit integers for the convolutional layers. - Quantize Semantic Segmentation Network and Generate CUDA Code
Quantize Convolutional Neural Network Trained for Semantic Segmentation and Generate CUDA Code
Quantization for FPGA Target
- Quantize Network for FPGA Deployment (Deep Learning HDL Toolbox)
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. - Classify Images on FPGA Using Quantized Neural Network (Deep Learning HDL Toolbox)
This example shows how to use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network (CNN) to an FPGA. - Classify Images on FPGA by Using Quantized GoogLeNet Network (Deep Learning HDL Toolbox)
This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image.
Quantization for CPU Target
- Generate int8 Code for Deep Learning Networks (MATLAB Coder)
Quantize and generate code for a pretrained convolutional neural network. - Generate INT8 Code for Deep Learning Network on Raspberry Pi (MATLAB Coder)
Generate code for deep learning network that performs inference computations in 8-bit integers.