Pruning/Sparsity Tutorial (original) (raw)

Model Pruning and Sparsity in YOLOv5

📚 This guide explains how to apply pruning to YOLOv5 🚀 models to create more efficient networks while maintaining performance.

What is Model Pruning?

Model pruning is a technique used to reduce the size and complexity of neural networks by removing less important parameters (weights and connections). This process creates a more efficient model with several benefits:

Reduced model size for easier deployment on resource-constrained devices
Faster inference speeds with minimal impact on accuracy
Lower memory usage and energy consumption
Improved overall efficiency for real-time applications

Pruning works by identifying and removing parameters that contribute minimally to the model's performance, resulting in a more lightweight model with similar accuracy.

Before You Start

Clone repo and install requirements.txt in a Python>=3.8.0 environment, including PyTorch>=1.8. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5 # clone cd yolov5 pip install -r requirements.txt # install

Test Baseline Performance

Before pruning, establish a baseline performance to compare against. This command tests YOLOv5x on COCO val2017 at image size 640 pixels. yolov5x.pt is the largest and most accurate model available. Other options are yolov5s.pt, yolov5m.pt and yolov5l.pt, or your own checkpoint from training a custom dataset ./weights/best.pt. For details on all available models, see the README table.

python val.py --weights yolov5x.pt --data coco.yaml --img 640 --half

Output:

`val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... Model Summary: 444 layers, 86705005 parameters, 0 gradients val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s] Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 157/157 [01:12<00:00, 2.16it/s] all 5000 36335 0.732 0.628 0.683 0.496 Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640) # <--- base speed

Apply Pruning to YOLOv5x (30% Sparsity)

We can apply pruning to the model using the torch_utils.prune() command. To test a pruned model, we update val.py to prune YOLOv5x to 0.3 sparsity (30% of weights set to zero):

Screenshot showing code for pruning YOLOv5x to 30% sparsity

30% pruned output:

Fusing layers... Model Summary: 444 layers, 86705005 parameters, 0 gradients Pruning model... 0.3 global sparsity val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s] Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 157/157 [01:11<00:00, 2.19it/s] all 5000 36335 0.724 0.614 0.671 0.478 Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640) # <--- prune speed

Results Analysis

From the results, we can observe:

30% sparsity achieved: 30% of the model's weight parameters in nn.Conv2d layers are now zero
Inference time remains unchanged: Despite pruning, the processing speed is essentially the same
Minimal performance impact: mAP dropped slightly from 0.507 to 0.489 (only 3.6% reduction)
Model size reduction: The pruned model requires less memory for storage

This demonstrates that pruning can significantly reduce model complexity with only a minor impact on performance, making it an effective optimization technique for deployment in resource-constrained environments.

Fine-tuning Pruned Models

For best results, pruned models should be fine-tuned after pruning to recover accuracy. This can be done by:

Applying pruning with a desired sparsity level
Training the pruned model for a few epochs with a lower learning rate
Evaluating the fine-tuned pruned model against the baseline

This process helps the remaining parameters adapt to compensate for the removed connections, often recovering most or all of the original accuracy.

Supported Environments

Ultralytics provides a range of ready-to-use environments, each pre-installed with essential dependencies such as CUDA, CUDNN, Python, and PyTorch, to kickstart your projects.

Project Status

This badge indicates that all YOLOv5 GitHub Actions Continuous Integration (CI) tests are successfully passing. These CI tests rigorously check the functionality and performance of YOLOv5 across various key aspects: training, validation, inference, export, and benchmarks. They ensure consistent and reliable operation on macOS, Windows, and Ubuntu, with tests conducted every 24 hours and upon each new commit.

📅 Created 1 year ago ✏️ Updated 9 days ago