Direct Voxel Grid Optimiztion (original) (raw)
Direct Voxel Grid Optimization
Super-fast Convergence for Radiance Fields Reconstruction
CVPR 2022 (Oral)
Results on custom casual capturing
A short guide to support custom forward-facing capturing and fly-through video rendering.
Results on real-world captured data
Features
- Speedup NeRF by replacing the MLP with the voxel grid.
- Simple scene representation:
- Volume densities:
dense voxel grid (3D). - View-dependent colors:
dense feature grid (4D) + shallow MLP.
- Volume densities:
- Pytorch implementation.
- †Pytorch cuda extention built just-in-time for another 2--3x speedup.
- †O(N) realization for the distortion loss proposed by mip-nerf 360.
- The loss improves our training time and quality.
- We have released a self-contained pytorch package: torch_efficient_distloss.
- Consider a batch of 8192 rays X 256 points.
- GPU memory consumption: 6192MB => 96MB.
- Run times for 100 iters: 20 sec => 0.2sec.
- Supported datasets:
- Bounded inward-facing:
- †Unbounded inward-facing:
- †Foward-facing:
† means new stuff after publication.
Post-activation
Observation. To produce sharp surface, we have to activate density into alpha
after
interpolation.
Proof. Post-activation can be arbitrarily close to a surface beyond linear. Detail in paper.
Toy example 1. Fitting a surface with a single 2D grid cell.
Toy example 2. Fitting a binary (occupancy) image with a 2D grid.
Ablation study. Up to 2.88 PSNR difference for novel-view synthesis.
Low-density initialization
Observation. The initial alpha values (activated from the volume densities) should be close to 0. We introduce a hyperparameter alpha-init
to control it.
Ablation study. The alpha-init
should be small enough to achieve good quality and avoid floater.
Caveat. We empirically find that the qualities and the training times are sensitive to the alpha-init
. We set alpha-init
to 3 different values for bounded, unbounded inward-facing, and forward-facining datasets respectively. You may want to try a few different values for new datasets.
🤔 It seems that the explicit (grid-based) representation needs careful regularizations, while the implicit (MLP network) doesn't. We still don't know the root cause for this empirical finding at this moment.