Dataset and Pipeline for Multi-view Light-Field Video (original) (raw)

Immersive light field video with a layered mesh representation

ACM Transactions on Graphics, 2020

We present a system for capturing, reconstructing, compressing, and rendering high quality immersive light field video. We accomplish this by leveraging the recently introduced DeepView view interpolation algorithm, replacing its underlying multi-plane image (MPI) scene representation with a collection of spherical shells that are better suited for representing panoramic light field content. We further process this data to reduce the large number of shell layers to a small, fixed number of RGBA+depth layers without significant loss in visual quality. The resulting RGB, alpha, and depth channels in these layers are then compressed using conventional texture atlasing and video compression techniques. The final compressed representation is lightweight and can be rendered on mobile VR/AR platforms or in a web browser. We demonstrate light field video results using data from the 16-camera rig of [Pozo et al. 2019] as well as a new low-cost hemispherical array made from 46 synchronized ac...

Towards Motion Aware Light Field Video for Dynamic Scenes

2013 IEEE International Conference on Computer Vision, 2013

Current Light Field (LF) cameras offer fixed resolution in space, time and angle which is decided a-priori and is independent of the scene. These cameras either trade-off spatial resolution to capture single-shot LF [20, 27, 12] or tradeoff temporal resolution by assuming a static scene to capture high spatial resolution LF [18, 3]. Thus, capturing high spatial resolution LF video for dynamic scenes remains an open and challenging problem. We present the concept, design and implementation of a LF video camera that allows capturing high resolution LF video. The spatial, angular and temporal resolution are not fixed a-priori and we exploit the scene-specific redundancy in space, time and angle. Our reconstruction is motion-aware and offers a continuum of resolution tradeoff with increasing motion in the scene. The key idea is (a) to design efficient multiplexing matrices that allow resolution tradeoffs, (b) use dictionary learning and sparse representations for robust reconstruction, and (c) perform local motion-aware adaptive reconstruction. We perform extensive analysis and characterize the performance of our motion-aware reconstruction algorithm. We show realistic simulations using a graphics simulator as well as real results using a LCoS based programmable camera. We demonstrate novel results such as high resolution digital refocusing for dynamic moving objects.

Robust and dense depth estimation for light field images

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, 2017

We propose a depth estimation method for light field images. Light field images can be considered as a collection of 2D images taken from different viewpoints arranged in a regular grid. We exploit this configuration and compute the disparity maps between specific pairs of views. This computation is carried out by a state of the art two-view stereo method providing a non dense disparity estimation. We propose a disparity interpolation method increasing the density and improving the accuracy of this initial estimate. Disparities obtained from several pairs of views are fused to obtain a unique and robust estimation. Finally, different experiments on synthetic and real images show how the proposed method outperforms state of the art results.

4D Temporally Coherent Light-field Video

3D Vision (3DV), 2017

Light-field video has recently been used in virtual and augmented reality applications to increase realism and immersion. However, existing light-field methods are generally limited to static scenes due to the requirement to acquire a dense scene representation. The large amount of data and the absence of methods to infer temporal coherence pose major challenges in storage, compression and editing compared to conventional video. In this paper, we propose the first method to extract a spatio-temporally coherent light-field video representation. A novel method to obtain Epipolar Plane Images (EPIs) from a spare light-field camera array is proposed. EPIs are used to constrain scene flow estimation to obtain 4D temporally coherent representations of dynamic light-fields. Temporal coherence is achieved on a variety of light-field datasets. Evaluation of the proposed light-field scene flow against existing multi-view dense correspondence approaches demonstrates a significant improvement in the accuracy of temporal coherence.

5D Light Field Synthesis from a Monocular Video

2021

Commercially available light field cameras have difficulty in capturing 5D (4D + time) light field videos. They can only capture still light field images or are excessively expensive for normal users to capture the light field video. To tackle this problem, we propose a deep learning-based method for synthesizing a light field video from a monocular video. We propose a new synthetic light field video dataset that renders photorealistic scenes using Unreal Engine because no light field video dataset is available. The proposed deep learning framework synthesizes the light field video with a full set (9 × 9) of sub-aperture images from a normal monocular video. The proposed network consists of three sub-networks, namely, feature extraction, 5D light field video synthesis, and temporal consistency refinement. Experimental results show that our model can successfully synthesize the light field video for synthetic and real scenes and outperforms the previous frame-by-frame method quantita...

CLASSROOM: synthetic high dynamic range light field dataset

Applications of Digital Image Processing XLV

Light field images provide tremendous amounts of visual information regarding the represented scenes, as they describe the light traversing in all directions for all the points of 3D space. Due to the recent technological advancements of light field visualization and its increasing relevance in research, the need for light field image datasets has risen significantly. Among the applications for which light field datasets are considered, high dynamic range light field image reconstruction has gained notable attention in the past years. When capturing a scene, either a single camera with a 2D microlense array or a 2D array of cameras is used to produce narrow-and wide-baseline light field images, respectively. Additionally, the turntable methodology may be used as well for narrow-baseline light fields. While the majority of these methods enables the creation of plausible and reliable light field image datasets, such baseline-specific setups can be extremely expensive and may require immense computing resources for proper calibration. Furthermore, the resulting light field is commonly limited with regard to angular resolution. A suitable alternative to produce a light field dataset is to do it synthetically by rendering light field images, which may easily overcome the aforementioned issues. In this paper, we discuss our work on creating the "CLASSROOM" light field image dataset, depicting a classroom scene. The content is rendered in horizontal-only parallax and full parallax as well. The scene contains a high variety of light distribution, particularly involving underexposed and overexposed regions, which are essential to HDR image applications.

Synthesizing Light Field Video from Monocular Video

2022

The hardware challenges associated with light-field (LF) imaging has made it difficult for consumers to access its benefits like applications in post-capture focus and aperture control. Learning-based techniques which solve the ill-posed problem of LF reconstruction from sparse (1, 2 or 4) views have significantly reduced the need for complex hardware. LF video reconstruction from sparse views poses a special challenge as acquiring ground-truth for training these models is hard. Hence, we propose a self-supervised learning-based algorithm for LF video reconstruction from monocular videos. We use self-supervised geometric, photometric and temporal consistency constraints inspired from a recent learning-based technique for LF video reconstruction from stereo video. Additionally, we propose three key techniques that are relevant to our monocular video input. We propose an explicit disocclusion handling technique that encourages the network to use information from adjacent input temporal frames, for inpainting disoccluded regions in a LF frame. This is crucial for a self-supervised technique as a single input frame does not contain any information about the disoccluded regions. We also propose an adaptive low-rank representation that provides a significant boost in performance by tailoring the representation to each input scene. Finally, we propose a novel refinement block that is able to exploit the available LF image data using supervised learning to further refine the reconstruction quality. Our qualitative and quantitative analysis demonstrates the significance of each of the proposed building blocks and also the superior results compared to previous state-of-the-art monocular LF reconstruction techniques. We further validate our algorithm by reconstructing LF videos from monocular videos acquired using a commercial GoPro camera. An open-source implementation is also made available 1 .

Depth Estimation using Light-Field Cameras

2014

Plenoptic cameras or light field cameras are a recent type of imaging devices that are starting to regain some popularity. These cameras are able to acquire the plenoptic function (4D light field) and, consequently, able to output the depth of a scene, by making use of the redundancy created by the multi-view geometry, where a single 3D point is imaged several times. Despite the attention given in the literature to standard plenoptic cameras, like Lytro, due to their simplicity and lower price, we did our work based on results obtained from a multi-focus plenoptic camera (Raytrix, in our case), due to their quality and higher resolution images. In this master thesis, we present an automatic method to estimate the virtual depth of a scene. Since the capture is done using a multi-focus plenoptic camera, we are working with multi-view geometry and lens with different focal lengths, and we can use that to back trace the rays in order to obtain the depth. We start by finding salient poin...

A qualitative comparison of MPEG view synthesis and light field rendering

2014 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2014

Free Viewpoint Television (FTV) is a new modality in next generation television, which provides the viewer free navigation through the scene, using image-based view synthesis from a couple of camera view inputs. The recently developed MPEG reference software technology is, however, restricted to narrow baselines and linear camera arrangements. Its reference software currently implements stereo matching and interpolation techniques, designed mainly to support three camera inputs (middle-left and middleright stereo). Especially in view of future use case scenarios in multi-scopic 3D displays, where hundreds of output views are generated from a limited number (tens) of wide baseline input views, it becomes mandatory to fully exploit all input camera information to its maximal potential. We therefore revisit existing view interpolation techniques to support dozens of camera inputs for better view synthesis performance. In particular, we show that Light Fields yield average PSNR gains of approximately 5 dB over MPEG's existing depth-based multiview video technology, even in the presence of large baselines.

Camera Animation for Immersive Light Field Imaging

Electronics

Among novel capture and visualization technologies, light field has made significant progress in the current decade, bringing closer its emergence in everyday use cases. Unlike many other forms of 3D displays and devices, light field visualization does not depend on any viewing equipment. Regarding its potential use cases, light field is applicable to both cinematic and interactive contents. Such contents often rely on camera animation, which is a frequent tool for the creation and presentation of 2D contents. However, while common 3D camera animation is often rather straightforward, light field visualization has certain constraints that must be considered before implementing any variation of such techniques. In this paper, we introduce our work on camera animation for light field visualization. Different types of conventional camera animation were applied to light field contents, which produced an interactive simulation. The simulation was visualized and assessed on a real light fi...