Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields (original) (raw)

Multi-Plane Neural Radiance Fields for Novel View Synthesis

arXiv (Cornell University), 2023

Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. Volumetric approaches provide a solution for modeling occlusions through the explicit 3D representation of the camera frustum. Multi-plane Images (MPI) are volumetric methods that represent the scene using front-parallel planes at distinct depths but suffer from depth discretization leading to a 2.D scene representation. Another line of approach relies on implicit 3D scene representations. Neural Radiance Fields (NeRF) utilize neural networks for encapsulating the continuous 3D scene structure within the network weights achieving photorealistic synthesis results, however, methods are constrained to perscene optimization settings which are inefficient in practice. Multi-plane Neural Radiance Fields (MINE) open the door for combining implicit and explicit scene representations. It enables continuous 3D scene representations, especially in the depth dimension, while utilizing the input image features to avoid perscene optimization. The main drawback of the current literature work in this domain is being constrained to single-view input, limiting the synthesis ability to narrow viewpoint ranges. In this work, we thoroughly examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields. In addition, we propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range. Features from the input source frames are effectively fused through a proposed attention-aware fusion module to highlight important information from different viewpoints. Experiments show the effectiveness of attention-based fusion and the promising outcomes of our proposed method when compared to multi-view NeRF and MPI techniques.

Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation

arXiv (Cornell University), 2022

Neural Radiance Fields (NeRF) [22] have emerged as a potent paradigm for representing scenes and synthesizing photo-realistic images. A main limitation of conventional NeRFs is that they often fail to produce high-quality renderings under novel viewpoints that are significantly different from the training viewpoints. In this paper, instead of exploiting few-shot image synthesis, we study the novel view extrapolation setting that (1) the training images can well describe an object, and (2) there is a notable discrepancy between the training and test viewpoints' distributions. We present RapNeRF (RAy Priors) as a solution. Our insight is that the inherent appearances of a 3D surface's arbitrary visible projections should be consistent. We thus propose a random ray casting policy that allows training unseen views using seen views. Furthermore, we show that a ray atlas pre-computed from the observed rays' viewing directions could further enhance the rendering quality for extrapolated views. A main limitation is that RapNeRF would remove the strong view-dependent effects because it leverages the multi-view consistency property.

FastNeRF: High-Fidelity Neural Rendering at 200FPS

ArXiv, 2021

Recent work on Neural Radiance Fields (NeRF) showed how neural networks can be used to encode complex 3D environments that can be rendered photorealistically from novel viewpoints. Rendering these images is very computationally demanding and recent improvements are still a long way from enabling interactive rates, even on high-end hardware. Motivated by scenarios on mobile and mixed reality devices, we propose FastNeRF, the first NeRF-based system capable of rendering high fidelity photorealistic images at 200Hz on a high-end consumer GPU. The core of our method is a graphics-inspired factorization that allows for (i) compactly caching a deep radiance map at each position in space, (ii) efficiently querying that map using ray directions to estimate the pixel values in the rendered image. Extensive experiments show that the proposed method is 3000 times faster than the original NeRF algorithm and at least an order of magnitude faster than existing work on accelerating NeRF, while mai...

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

ArXiv, 2022

Novel view synthesis is a long-standing problem. In this work, we consider a variant of the problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. The current state of the art is based on Neural Radiance Fields (NeRFs), and while achieving impressive results, the methods suffer from long training times as they require evaluating thousands of 3D point samples via a deep neural network for each image. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network. Our model uses a two-stage architecture consisting of a codebook and a transformer model. The codebook is used to embed individual images into a smaller latent space, and the transformer solves the view synthesis task in this more compact space. To train our model efficiently, we introduce a novel branching attention mechanism that allows us to use the same model not only for neural rendering but also for camera pose estimation. Experimental results on real-world scenes show that our approach is competitive compared to NeRF-based methods while not reasoning in 3D, and it is faster to train.

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields

ArXiv, 2021

Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF’s parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model’s internal representation of outgoing radiance is interpretable and useful for scene editing.

PANeRF: Pseudo-view Augmentation for Improved Neural Radiance Fields Based on Few-shot Inputs

Cornell University - arXiv, 2022

Figure 1. View synthesis based on the DTU with a 3-view setting. We demonstrate a qualitative comparison of pseudo-view augmentation neural radiance fields (PANeRF) with other few-shot methods of Scan21, Scan103, and Scan110 scenes based on the DTU dataset. Although other methods experience inaccurate geometry and appearance, our approach yields high-quality rendering results with minimal artifacts.

Light Field Neural Rendering

ArXiv, 2021

Classical light field rendering for novel view synthesis can accurately reproduce view-dependent effects such as reflection, refraction, and translucency, but requires a dense view sampling of the scene. Methods based on geometric reconstruction need only sparse views, but cannot accurately model non-Lambertian effects. We introduce a model that combines the strengths and mitigates the limitations of these two directions. By operating on a four-dimensional representation of the light field, our model learns to represent view-dependent effects accurately. By enforcing geometric constraints during training and inference, the scene geometry is implicitly learned from a sparse set of views. Concretely, we introduce a two-stage transformer-based model that first aggregates features along epipolar lines, then aggregates features along reference views to produce the color of a target ray. Our model outperforms the state-ofthe-art on multiple forward-facing and 360◦ datasets, with larger ma...

DoubleField: Bridging the Neural Surface and Radiance Fields for High-fidelity Human Rendering


We introduce DoubleField, a novel representation combining the merits of both surface field and radiance field for high-fidelity human rendering. Within DoubleField, the surface field and radiance field are associated together by a shared feature embedding and a surface-guided sampling strategy. In this way, DoubleField has a continuous but disentangled learning space for geometry and appearance modeling, which supports fast training, inference, and finetuning. To achieve high-fidelity free-viewpoint rendering, DoubleField is further augmented to leverage ultra-highresolution inputs, where a view-to-view transformer and a transfer learning scheme are introduced for more efficient learning and finetuning from sparse-view inputs at original resolutions. The efficacy of DoubleField is validated by the quantitative evaluations on several datasets and the qualitative results in a real-world sparse multi-view system, showing its superior capability for photo-realistic freeviewpoint human ...

SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

arXiv (Cornell University), 2022

Figure 1. An example of the inputs and outputs of our 3D inpainting framework. In addition to the images captured from the scene and their corresponding camera parameters, users are tasked with providing a few points in a single image to indicate which object they wish to remove from the scene (upper-left inset). These sparse annotations are then automatically transferred to all other views, and utilized for multiview mask construction (upper-right inset). The resulting 3D-consistent mask is used in a perceptual optimization problem that results in 3D scene inpainting (lower row), with rendered depth from the optimized NeRF shown for each image as an inset.

A Unified Deep Learning Approach for Foveated Rendering & Novel View Synthesis from Sparse RGB-D Light Fields

Near-eye light field displays provide a solution to visual discomfort when using head mounted displays by presenting accurate depth and focal cues. However, light field HMDs require rendering the scene from a large number of viewpoints. This computational challenge of rendering sharp imagery of the foveal region and reproduce retinal defocus blur that correctly drives accommodation is tackled in this paper. We designed a novel end-to-end convolutional neural network that leverages human vision to perform both foveated reconstruction and view synthesis using only 1.2% of the total light field data. The proposed architecture comprises of log-polar sampling scheme followed by an interpolation stage and a convolutional neural network. To the best of our knowledge, this is the first attempt that synthesizes the entire light field from sparse RGB-D inputs and simultaneously addresses foveation rendering for computational displays. Our algorithm achieves fidelity in the fovea without any perceptible artifacts in the peripheral regions. The performance in fovea is comparable to the state-of-the-art view synthesis methods, despite using around 10× less light field data.