GitHub - AegeanKI/OpticalFlowFromDepth (original) (raw)
OpticalFlowFromDepth
This repository contains the source code for our paper:
Skin the sheep not only once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow
Sheng-Chi Huang, Wei-Chen Chiu
Introduction
Optical flow estimation is crucial for various applications in vision and robotics. As the difficulty of collecting ground truth optical flow in real-world scenarios, most of the existing methods of learning optical flow still adopt synthetic dataset for supervised training or utilize photometric consistency across temporally adjacent video frames to drive the unsupervised learning, where the former typically has issues of generalizability while the latter usually performs worse than the supervised ones. To tackle such challenges, we propose to leverage the geometric connection between optical flow estimation and stereo matching (based on the similarity upon finding pixel correspondences across images) to unify various real-world depth estimation datasets for generating supervised training data upon optical flow. Specifically, we turn the monocular depth datasets into stereo ones via synthesizing virtual disparity, thus leading to the flows along the horizontal direction; moreover, we introduce virtual camera motion into stereo data to produce additional flows along the vertical direction. Furthermore, we propose applying geometric augmentations on one image of an optical flow pair, encouraging the optical flow estimator to learn from more challenging cases. Lastly, as the optical flow maps under different geometric augmentations actually exhibit distinct characteristics, an auxiliary classifier which trains to identify the type of augmentation from the appearance of the flow map is utilized to further enhance the learning of the optical flow estimator. Our proposed method is general and is not tied to any particular flow estimator, where extensive experiments based on various datasets and optical flow estimation models verify its efficacy and superiority.
Installation
Create a virtual environment for this project.
conda create --name OpticalFlowFromDepth python=3.9 conda activate OpticalFlowFromDepth
Clone this repo and install required packages, the code was developed with PyTorch 1.12.1 and Cuda 11.3.
git clone https://github.com/AegeanKI/experiment cd experiment pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 pip install -r requirement.txt
Compile alt_cuda
module, which is written in C, to handle warping operation.
cd alt_cuda python setup.py install
link the dataloader to allow the RAFT model and the GMFlow model use our dataloader and our classifier.
cd adjusted_RAFT ln -s ../auxiliary_classifier auxiliary_classifier cd core ln -s ../../dataloader.py my_dataloader.py cd ../../adjusted_gmflow ln -s ../auxiliary_classifier auxiliary_classifier cd data ln -s ../../dataloader.py my_dataloader.py
Preprocessing
We preprocess the DIML
dataset as sample, you can also use ReDWeb
.
python preprocess.py --dataset DIML --gpu 0 --split 1 --split_id 0
These parameters are:
dataset
: preprocess specific datasetgpu
: preprocess dataset on specific gpusplit
andsplit_id
: only preprocess part of dataset
Datasets
To evaluate/train the RAFT/GMFlow models, you will need to download the required datasets.
By default dataset.py
in the RAFT model and the GMFlow model and dataloader.py
will search for datasets in these locations. You can create symbolic links to whereever the datasets werer downloaded in the datasets folder.
├── datasets ├── Sintel ├── test ├── training ├── KITTI ├── testing ├── training ├── devkit ├── FlyingChairs_release ├── data ├── FlyingThings3D ├── frames_cleanpass ├── frames_finalpass ├── optical_flow ├── ReDWeb_V1 ├── Imgs ├── RDs ├── DIML ├── test ├── train ├── AugmentedDatasets ├── ReDWeb ├── DIML
You can use the following command to create a soft link dataset/datasetA folder connect to /real/path/to/datasetA.
ln -s dataset/datasetA /real/path/to/datasetA
Training
We use the mixed dataset (ReDWeb+DIML) as esample.
- Train on the RAFT model.
python -u train.py --name adjusted_raft --stage mixed --validation kitti --gpus 0
--num_steps 120000 --batch_size 8 --lr 0.0025 --val_freq 10000
--mixed_precision --is_first_stage
--add_classifier
--classifier_args auxiliary_classifier/args.txt
--classifier_checkpoint auxiliary_classifier/checkpoint.pth \
- Train on GMFlow model.
CHECKPOINT_DIR=checkpoints/adjusted_gmflow &&
mkdir -p ${CHECKPOINT_DIR} &&
CUDA_VISIBLE_DEVICES=2,3
python -m torch.distributed.launch --nproc_per_node=0 --master_port=9988 main.py
--launcher pytorch --checkpoint_dir ${CHECKPOINT_DIR} --stage mixed
--batch_size 16 --val_dataset sintel kitti --lr 4e-4 --image_size 368 560
--padding_factor 16 --upsample_factor 8 --with_speed_metric --val_freq 1000
--save_ckpt_freq 10000 --num_steps 100000
--add_classifier
--classifier_args auxiliary_classifier/auxiliary_classifier_args.txt
--classifier_checkpoint auxiliary_classifier/auxiliary_classifier_checkpoint.pth
2>&1 | tee -a ${checkpoint_dir}/train.log
These parameters are:
add_classifier
: enable classifier while training optical flow estimatorclassifier_args
: ifadd_classifier
flag is enable, use this flag to specify classifier args.classifier_checkpoint
: ifadd_classifier
flag is enable, use this flag to specify classifier checkpoint.classify_loss_weight_init
andclassify_loss_weight_increase
: adjust the impact of classifier (linearly)max_classify_loss_weight
andmin_classify_loss_weight
: set the upper and the lower bound of the impact of the classifier
Testing
We use the validation set of KITTI-15 as esample. The ground truth of optical flow includes occluded area.
- You can download our pretrained models from here
- Test on the RAFT model.
- TABLE I R+D, C->T->R+D
- TABLE III full, noclassifier, +virtual disparity, empty
python evaluate.py --model=models/raft-mixed.pth --dataset=kitti --mixed_precision
- Test on the GMFlow model.
- TABLE II R+D, C->T->R+D
- TABLE III full, noclassifier, +virtual disparity, empty
CUDA_VISIBLE_DEVICES=0 python main.py --eval --val_dataset kitti kitti12 sintel --resume pretrained/gmflow-CT-mixed.pth
Acknowledgement
- The preprocessing code of the virtual ego-motion is borrowed from depthstillation
- The training code and the testing code of the RAFT model is borrowed from RAFT
- The training code and the testing code of the GMFlow model is borrowed from GMFlow
Citation
Please cite our paper and star this repository if it's helpful to your work!
@inproceedings{huang2024skin, title={Skin the sheep not only once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow}, author={Sheng-Chi Huang, Wei-Chen Chiu}, booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, year={2024}, url={https://arxiv.org/abs/2310.01833} }