GitHub - ShoufaChen/AdaptFormer: [NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition" (original) (raw)

[NeurIPS 2022] AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Project Page | arXiv

teaser

This is a PyTorch implementation of the paper AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.

Shoufa Chen1*,Chongjian Ge1*,Zhan Tong2,Jiangliu Wang2,3,Yibing Song2,Jue Wang2,Ping Luo1
1The University of Hong Kong, 2Tencent AI Lab, 3The Chinese University of Hong Kong
*denotes equal contribution

Catalog

Usage

Install

Data Preparation

See DATASET.md.

Training

Start

video

OMP_NUM_THREADS=1 python3 -m torch.distributed.launch
--nproc_per_node=8 --nnodes=8
--node_rank=$1 --master_addr=$2 --master_port=22234
--use_env main_video.py
--finetune /path/to/pre_trained/checkpoints
--output_dir /path/to/output
--batch_size 16 --epochs 90 --blr 0.1 --weight_decay 0.0 --dist_eval
--data_path /path/to/SSV2 --data_set SSV2
--ffn_adapt

on each of 8 nodes. --master_addr is set as the ip of the node 0. and --node_rank is 0, 1, ..., 7 for each node.

image

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_image.py
--batch_size 128 --cls_token
--finetune /path/to/pre_trained/mae_pretrain_vit_b.pth
--dist_eval --data_path /path/to/data
--output_dir /path/to/output
--drop_path 0.0 --blr 0.1
--dataset cifar100 --ffn_adapt

To obtain the pre-trained checkpoint, see PRETRAIN.md.

Acknowledgement

The project is based on MAE, VideoMAE, timm, and MAM. Thanks for their awesome works.

Citation

@article{chen2022adaptformer,
      title={AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition},
      author={Chen, Shoufa and Ge, Chongjian and Tong, Zhan and Wang, Jiangliu and Song, Yibing and Wang, Jue and Luo, Ping},
      journal={arXiv preprint arXiv:2205.13535},
      year={2022}
}

License

This project is under the MIT license. See LICENSE for details.