Add SWAG model weight that only the linear head is finetuned to ImageNet1K by YosuaMichael · Pull Request #5793 · pytorch/vision (original) (raw)

subtask of #5708

Model Description

This model has trunk weight from weakly supervised learning described in https://arxiv.org/pdf/2201.08371.pdf. The linear head is fine-tuned to IMAGENET1K dataset while the pre-trained trunk weights are frozen.

This model is suitable for users that want to fine tune the pre-trained trunk on other downstream datasets

Linear head Fine-tuning parameters on IMAGENET1K:

Regnet model (for all size 16gf, 32gf, 128gf):

Num epochs: 28
Trained on 1 nodes with 8 voltas GPU (32Gb) each
Batch size per GPU: 32
image size: 224
SGD Optimizer with params:
- weight decay: 0.001
- momentum: 0.9
- use Nesterov: True
Learning Rate param:
- scheduler: CosineAnnealingLR
- Start value: 0.001
ImageAugmentation transforms:
- RandomResizeCrop of size 224 with interpolation 3
- RandomHorizontalFlip
- Normalize
Note: Trained with pytorch mixed precision

VIsion Transformer (for all size b/16, l/16, h/14):

Num epochs: 28
Trained on 4 nodes with 8 voltas GPU (32Gb) each
Batch size per GPU: 32
image size: 224
SGD Optimizer with params:
- weight decay: 1.00 E-09
- momentum: 0.9
- use Nesterov: True
Learning Rate param:
- scheduler: CosineAnnealingLR
- Start value: 0.04
ImageAugmentation transforms:
- RandomResizeCrop of size 224 with interpolation 3
- RandomHorizontalFlip
- Normalize
Note: Trained with pytorch mixed precision

Validation script and result

## RegNet_Y_16GF
python -u ~/script/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model regnet_y_16gf --data-path="/datasets01_ontap/imagenet_full_size/061417" --test-only --batch-size=1 --weights="RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_LINEAR_V1"
# Acc@1 83.976 Acc@5 97.244

## RegNet_Y_32GF
python -u ~/script/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model regnet_y_32gf --data-path="/datasets01_ontap/imagenet_full_size/061417" --test-only --batch-size=1 --weights="RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_LINEAR_V1"
# Acc@1 84.622 Acc@5 97.480

## RegNet_Y_128GF
python -u ~/script/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model regnet_y_128gf --data-path="/datasets01_ontap/imagenet_full_size/061417" --test-only --batch-size=1 --weights="RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_LINEAR_V1"
# Acc@1 86.068 Acc@5 97.844

## ViT_B_16
python -u ~/script/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_16 --data-path="/datasets01_ontap/imagenet_full_size/061417" --test-only --batch-size=1 --weights="ViT_B_16_Weights.IMAGENET1K_SWAG_LINEAR_V1"
# Acc@1 81.886 Acc@5 96.180

## ViT_L_16
python -u ~/script/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_16 --data-path="/datasets01_ontap/imagenet_full_size/061417" --test-only --batch-size=1 --weights="ViT_L_16_Weights.IMAGENET1K_SWAG_LINEAR_V1"
# Acc@1 85.146 Acc@5 97.422

## ViT_H_14
python -u ~/script/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_h_14 --data-path="/datasets01_ontap/imagenet_full_size/061417" --test-only --batch-size=1 --weights="ViT_H_14_Weights.IMAGENET1K_SWAG_LINEAR_V1"
# Acc@1 85.708 Acc@5 97.730

Sample script to load model

from torchvision.models.vision_transformer import vit_b_16, ViT_B_16_Weights

m = vit_b_16(weights=ViT_B_16_Weights.IMAGENET1K_SWAG_LINEAR_V1)