GitHub - wyhsirius/g3an-project: [CVPR 2020] G3AN: Disentangling Appearance and Motion for Video Generation (original) (raw)

G3AN: Disentangling Appearance and Motion for Video Generation

Project Page | Paper

This is the official PyTorch implementation of the CVPR 2020 paper "G3AN: Disentangling Appearance and Motion for Video Generation"

Requirements

Dataset

You can download the original UvA-NEMO datest from https://www.uva-nemo.org/ and use https://github.com/1adrianb/face-alignment to crop face regions. We also provide our preprocessed version here.

Pretrained model

Download the G3AN pretrained model on UvA-NEMO from here.

Inference

  1. For sampling NUM videos and saving them under ./demos/EXP_NAME

python demo_random.py --model_path MODELPATH−−nMODEL_PATH --n MODELPATHnNUM --demo_name $EXP_NAME

  1. For sampling N appearances with M motions and saving them under ./demos/EXP_NAME

python demo_nxm.py --model_path MODELPATH−−nzatestMODEL_PATH --n_za_test MODELPATHnzatestN --n_zm_test M−−demonameM --demo_name MdemonameEXP_NAME

  1. For sampling N appearances with different video lengthes (9 different video lengthes) and saving them under ./demos/EXP_NAME

python demo_multilength.py --model_path MODELPATH−−nzatestMODEL_PATH --n_za_test MODELPATHnzatestN --demo_name $EXP_NAME

Training

python train.py --data_path DATASET−−expnameDATASET --exp_name DATASETexpnameEXP_NAME

Evaluation

  1. Generate 5000 videos for evaluation, save them in $GEN_PATH

python generate_videos.py --gen_path $GEN_PATH

  1. Move into evaluation folder

Download feature extractor resnext-101-kinetics.pth from here to the current folder. Pre-computed UvA_NEMO dataset stats can be found in stats/uva.npz. If you would like to compute it youeself, save all the training videos in $UVA_PATH and run (to obtain 64x64 videos, you need to specify output size when using ffmpeg),

python precalc_stats.py --data_path $UVA_PATH

To compute FID

python fid.py $GEN_PATH stats/uva_64.npz

I have provided npz file for both 64 and 128 resolutions. You can obtain FID around 60 (64x64) and 130 (128x128) by evaluating provided model. Here I improved the original video discriminator by using a (2+1)D ConvNets instead of 3D ConvNets.

TODOs

Citation

If you find this code useful for your research, please consider citing our paper:

@InProceedings{Wang_2020_CVPR, author = {Wang, Yaohui and Bilinski, Piotr and Bremond, Francois and Dantcheva, Antitza}, title = {{G3AN}: Disentangling Appearance and Motion for Video Generation}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} }

Acknowledgement

Part of the evaluation code is adapted from evan. I have moved most of the operations from CPU into GPU to accelerate the computation. We thank authors for their contribution to the community.