View Synthesis Research Papers - Academia.edu (original) (raw)
2025, IEEE
Multi-Step Instruction-Guided Editing is a novel approach for 3D scene editing that allows users to edit a Neural Radiance Field (NeRF) iteratively by providing sequential instructions. Current methods, like single-step instruction... more
Multi-Step Instruction-Guided Editing is a novel approach for 3D scene editing that allows users to edit a Neural Radiance Field (NeRF) iteratively by providing sequential instructions. Current methods, like single-step instruction models, struggle with maintaining 3D consistency when performing complex, multi-layered edits, often leading to inconsistencies across viewpoints. These existing approaches do not provide the capability to modify or change the preceding edits which makes it difficult to make intricate changes to a scene. In this paper, we introduce Multi-StepNeRF, a sequential pipeline that allows users to issue multiple text commands, each building upon the last while ensuring 3D consistency throughout. The pipeline allows, for example, to first add an object, then change its type, and later edit its properties. Intermediate NeRF models are saved at various steps along the pipeline to offer more flexibility and undo unwanted changes. We show that this method allows complicated transformations of large scenes with high precision and without breaking the consistency, which makes it possible to provide the user with an interface for realistic 3D environment editing.
2025, IEEE Transactions on Circuits and Systems for Video Technology
2025, IEEE Transactions on Circuits and Systems for Video Technology
Advanced multiview video systems are able to generate intermediate viewpoints of a 3D scene. To enable low complexity free view generation, texture and its associated depth are used as input data for each viewpoint. To improve the coding... more
Advanced multiview video systems are able to generate intermediate viewpoints of a 3D scene. To enable low complexity free view generation, texture and its associated depth are used as input data for each viewpoint. To improve the coding efficiency of such content, view synthesis prediction (VSP) is proposed to further reduce inter-view redundancy in addition to traditional disparity compensated prediction (DCP). This paper describes and analyzes rate-distortion optimized VSP designs, which were adopted in the 3D extensions of both AVC and HEVC. In particular, we propose a novel backward-VSP scheme using a derived disparity vector, as well as efficient signaling methods in the context of AVC and HEVC. Additionally, we put forward a novel depth-assisted motion vector prediction method to optimize the coding efficiency. A thorough analysis of coding performance is provided using different VSP schemes and configurations. Experimental results demonstrate average bit rate reductions of 2.5% and 1.2% in AVC and HEVC coding frameworks, respectively, with up to 23.1% bit rate reduction for dependent views.
2025, arXiv (Cornell University)
Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an... more
Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. The performance of existing methods is still limited as they fail to thoroughly explore the coherence among LF views and are insufficient in accurately preserving the parallax structure of the scene. In this paper, we propose a novel learningbased LF spatial SR framework, in which each view of an LF image is first individually super-resolved by exploring the complementary information among views with combinatorial geometry embedding. For accurate preservation of the parallax structure among the reconstructed views, a regularization network trained over a structure-aware loss function is subsequently appended to enforce correct parallax relationships over the intermediate estimation. Our proposed approach is evaluated over datasets with a large number of testing images including both synthetic and realworld scenes. Experimental results demonstrate the advantage of our approach over state-of-the-art methods, i.e., our method not only improves the average PSNR by more than 1.0 dB but also preserves more accurate parallax details, at a lower computational cost.
2025, arXiv (Cornell University)
In this paper, we tackle the problem of dense light field (LF) reconstruction from sparsely-sampled ones with wide baselines and propose a learnable model, namely dynamic interpolation, to replace the commonly-used geometry warping... more
In this paper, we tackle the problem of dense light field (LF) reconstruction from sparsely-sampled ones with wide baselines and propose a learnable model, namely dynamic interpolation, to replace the commonly-used geometry warping operation. Specifically, with the estimated geometric relation between input views, we first construct a lightweight neural network to dynamically learn weights for interpolating neighbouring pixels from input views to synthesize each pixel of novel views independently. In contrast to the fixed and content-independent weights employed in the geometry warping operation, the learned interpolation weights implicitly incorporate the correspondences between the source and novel views and adapt to different image content information. Then, we recover the spatial correlation between the independently synthesized pixels of each novel view by referring to that of input views using a geometry-based spatial refinement module. We also constrain the angular correlation between the novel views through a disparity-oriented LF structure loss. Experimental results on LF datasets with wide baselines show that the reconstructed LFs achieve much higher PSNR/SSIM and preserve the LF parallax structure better than stateof-the-art methods. The source code is publicly available at .
2025, IEEE Transactions on Pattern Analysis and Machine Intelligence
A densely-sampled light field (LF) is highly desirable in various applications, such as 3-D reconstruction, post-capture refocusing and virtual reality. However, it is costly to acquire such data. Although many computational methods have... more
A densely-sampled light field (LF) is highly desirable in various applications, such as 3-D reconstruction, post-capture refocusing and virtual reality. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficiency, or the restriction on the regularity of the sampling pattern. To this end, we propose a novel learning-based method, which accepts sparsely-sampled LFs with irregular structures, and produces densely-sampled LFs with arbitrary angular resolution accurately and efficiently. We also propose a simple yet effective method for optimizing the sampling pattern. Our proposed method, an end-to-end trainable network, reconstructs a densely-sampled LF in a coarse-to-fine manner. Specifically, the coarse sub-aperture image (SAI) synthesis module first explores the scene geometry from an unstructured sparsely-sampled LF and leverages it to independently synthesize novel SAIs, in which a confidence-based blending strategy is proposed to fuse the information from different input SAIs, giving an intermediate densely-sampled LF. Then, the efficient LF refinement module learns the angular relationship within the intermediate result to recover the LF parallax structure. Comprehensive experimental evaluations demonstrate the superiority of our method on both real-world and synthetic LF images when compared with state-of-the-art methods. In addition, we illustrate the benefits and advantages of the proposed approach when applied in various LF-based applications, including image-based rendering and depth estimation enhancement. The code is available at .
2025, 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
In this paper, we tackle the problem of dense light field (LF) reconstruction from sparsely-sampled ones with wide baselines and propose a learnable model, namely dynamic interpolation, to replace the commonly-used geometry warping... more
In this paper, we tackle the problem of dense light field (LF) reconstruction from sparsely-sampled ones with wide baselines and propose a learnable model, namely dynamic interpolation, to replace the commonly-used geometry warping operation. Specifically, with the estimated geometric relation between input views, we first construct a lightweight neural network to dynamically learn weights for interpolating neighbouring pixels from input views to synthesize each pixel of novel views independently. In contrast to the fixed and content-independent weights employed in the geometry warping operation, the learned interpolation weights implicitly incorporate the correspondences between the source and novel views and adapt to different image content information. Then, we recover the spatial correlation between the independently synthesized pixels of each novel view by referring to that of input views using a geometry-based spatial refinement module. We also constrain the angular correlation between the novel views through a disparity-oriented LF structure loss. Experimental results on LF datasets with wide baselines show that the reconstructed LFs achieve much higher PSNR/SSIM and preserve the LF parallax structure better than stateof-the-art methods. The source code is publicly available at .
2025
In this paper we propose a robust hierarchical approach for the estimation of the trifocal tensor. It makes use of pyramids, the sub-pixel Förstner point operator, least squares matching, RANSAC, and the Carlsson-Weinshall duality. We... more
In this paper we propose a robust hierarchical approach for the estimation of the trifocal tensor. It makes use of pyramids, the sub-pixel Förstner point operator, least squares matching, RANSAC, and the Carlsson-Weinshall duality. We also show how the trifocal tensor can be utilized for an efficient view synthesis which we have optimized by parameterizing it according to the epipolar lines.
2025, IEEE Transactions on Circuits and Systems for Video Technology
This paper presents a new hybrid Kinect-variety based synthesis scheme that renders artifacts-free multiple views for autostereoscopic/auto-multiscopic displays. The proposed approach does not explicitly require dense scene depth... more
This paper presents a new hybrid Kinect-variety based synthesis scheme that renders artifacts-free multiple views for autostereoscopic/auto-multiscopic displays. The proposed approach does not explicitly require dense scene depth information for synthesizing novel views from arbitrary viewpoints. Instead, the integrated framework first constructs a consistent minimal image-space parameterization of the underlying 3D scene. The compact representation of scene structure is formed using only implicit sparse depth information of few reference scene points extracted from raw RGB-D data. The views from arbitrary positions can be inferred by moving the novel camera in parameterized space by enforcing Euclidean constraints on reference scene images under a full-perspective projection model. Unlike state-of-the-art DIBR methods, where input depth map accuracy is crucial for high quality output, our proposed algorithm does not depend on precise per-pixel geometry information. Therefore, it simply sidesteps to recover and refine the incomplete or noisy depth estimates with advanced filling or upscaling techniques. Our approach performs fairly well in unconstrained indoor/outdoor environments, where the performance of range sensors or dense depth-based algorithms could be seriously affected due to scene complex geometric conditions. We demonstrate that the proposed hybrid scheme provides guarantees on the completeness, optimality with respect to the inter-view consistency of the algorithm. In the experimental validation, we performed a quantitative evaluation as well as subjective assessment on scene with complex geometric or surface properties. Comparison with latest representative DIBR methods is additionally performed to demonstrate the superior performance of the proposed scheme.
2025, IEEE Access
In this article, we propose a depth map refinement method that increases the quality of immersive video. The proposal highly enhances the inter-view consistency of depth maps (estimated or acquired by any method), crucial for achieving... more
In this article, we propose a depth map refinement method that increases the quality of immersive video. The proposal highly enhances the inter-view consistency of depth maps (estimated or acquired by any method), crucial for achieving the required fidelity of the virtual view synthesis process. In the described method, only information from depth maps is used, as the use of texture can introduce errors in the refinement, mostly due to inter-view color inconsistencies and noise. In order to evaluate the performance of the proposal and compare it with the state of the art, three experiments were conducted. To test the influence of the refinement on the encoding of immersive video, four sets of depth maps (original, refined with the synthesis-based refinement, a bilateral filter, and with the proposal) were encoded with the MPEG Immersive Video (MIV) encoder. In the second experiment, in order to provide a direct evaluation of the accuracy of depth maps, the Middlebury database comparison was performed. In the third experiment, the temporal consistency of depth maps was assessed by measuring the efficiency of encoding of the virtual views. The experiments showed both a high increase of the virtual view synthesis quality in immersive video applications and higher similarity to ground-truth after the refinement of estimated depth maps. The usefulness of the proposal was appreciated and confirmed by the experts of the ISO/IEC MPEG group for immersive video and the method became the MPEG Reference Software for the depth refinement. The implementation of the method is publicly available for other researchers.
2025
This paper describes a technique for inter-view depth map consistency improvement for automatically and semiautomatically estimated depth maps. The goal is to improve 3D scene representation consistency by exchanging spatial information... more
This paper describes a technique for inter-view depth map consistency improvement for automatically and semiautomatically estimated depth maps. The goal is to improve 3D scene representation consistency by exchanging spatial information between all depth maps in a multiview sequence. Presented technique is based on iterative inter-view information exchange followed by depth quality assessment stage which prevents depth quality loss. The depth-map consistency improvement yields in better multi-view compression ratio and virtual view quality.
2025
During the last two decades, a new technology generation of video compression was introduced about each 9 years. Each new compression-technology generation provides halving of necessary bitrates as compared to the last previous... more
During the last two decades, a new technology generation of video compression was introduced about each 9 years. Each new compression-technology generation provides halving of necessary bitrates as compared to the last previous generation. This increasing single-view compression performance is related to increasing compression performance of multiview video coding. For multiview video with associated depth maps, additional significant bitrate reduction may be achieved. The paper reports the original compression technology that was designed and developed at Poznań University of Technology in response to MPEG Call for Proposals on 3D Video Coding Technology. The main idea of this technique is to predict very efficiently the side views and the depth maps from the base view.
2025
In the paper we present a method for increasing the quality of views synthesized with typical Depth-Image-Based Rendering (DIBR) view synthesis algorithms. In the proposed idea the resolution of input real views and corresponding depth... more
In the paper we present a method for increasing the quality of views synthesized with typical Depth-Image-Based Rendering (DIBR) view synthesis algorithms. In the proposed idea the resolution of input real views and corresponding depth maps is doubled before the view synthesis. After the synthesis, the resolution of a synthesized view is downsampled back to the original resolution. This approach is transparent for the view synthesis algorithms, thus can be used with any DIBR method. In the paper, tests for two synthesis algorithms (the state-of-the-art MPEG reference software and our view synthesis method) are presented. For both algorithms, the proposed upsampling improves objective and subjective quality of synthesized views.
2025
In the paper, we describe the extensions of the 3D-HEVC compression technology aimed at improved compression efficiency for multi-view sequences acquired from arbitrarily located cameras. Our proposal refines the inter-view prediction by... more
In the paper, we describe the extensions of the 3D-HEVC compression technology aimed at improved compression efficiency for multi-view sequences acquired from arbitrarily located cameras. Our proposal refines the inter-view prediction by replacing the horizontal shifts with the true mapping in the 3D space. This implies changes in several coding tools, which we describe in details. The paper also reports experimental results on the comparison of the proposed solution to the 3D-HEVC standard codec. We also discuss the influence of the number of views and the view coding order on the compression efficiency.
2025, International Journal of Electronics and Telecommunications
In the paper, two preprocessing methods for virtual view synthesis are presented. In the first approach, both horizontal and vertical resolutions of the real views and the corresponding depth maps are doubled in order to perform view... more
In the paper, two preprocessing methods for virtual view synthesis are presented. In the first approach, both horizontal and vertical resolutions of the real views and the corresponding depth maps are doubled in order to perform view synthesis on images with densely arranged points. In the second method, real views are filtered in order to eliminate blurred or improperly shifted edges of the objects. Both methods are performed prior to synthesis, thus they may be applied to different Depth-Image-Based Rendering algorithms. In the paper, for both proposed methods, the achieved quality gains are presented.
2025, IGI Global Scientific Publishing
This chapter explores the transformative role of Artificial Intelligence (AI) and Data Analytics in modern Project Management Information Systems (PMIS). It delves into how these technologies redefine project planning, execution, and... more
This chapter explores the transformative role of Artificial Intelligence (AI) and Data Analytics in modern Project Management Information Systems (PMIS). It delves into how these technologies redefine project planning, execution, and monitoring, enhancing efficiency and decision-making capabilities. AI empowers project managers with predictive analytics, automation, and real-time insights, enabling proactive responses to risks and bottlenecks. Data analytics complements this by uncovering patterns, diagnosing issues, and prescribing optimal strategies for improved outcomes. Real-world applications illustrate the strategic alignment of projects with broader organizational goals. Challenges such as data governance, cybersecurity, and talent requirements are addressed, along with future directions like digital twins and advanced algorithms. This discussion positions AI and Data Analytics as indispensable tools for achieving sustainable, data-driven project success in an increasingly complex landscape.
2025, CSRN
Depth-Image-Based Rendering (DIBR) can synthesize a virtual view image from a set of multiview images andcorresponding depth maps. However, this requires an accurate depth map estimation that incurs a high compu-tational cost over several... more
Depth-Image-Based Rendering (DIBR) can synthesize a virtual view image from a set of multiview images andcorresponding depth maps. However, this requires an accurate depth map estimation that incurs a high compu-tational cost over several minutes per frame in DERS (MPEG-I’s Depth Estimation Reference Software) even byusing a high-class computer. LiDAR cameras can thus be an alternative solution to DERS in real-time DIBR ap-plications. We compare the quality of a low-cost LiDAR camera, the Intel Realsense LiDAR L515 calibrated andconfigured adequately, with DERS using MPEG-I’s Reference View Synthesizer (RVS). In IV-PSNR, the LiDARcamera reaches 32.2dB view synthesis quality with a 15cm camera baseline and 40.3dB with a 2cm baseline.Though DERS outperforms the LiDAR camera with 4.2dB, the latter provides a better quality-performance trade-off. However, visual inspection demonstrates that LiDAR’s virtual views have even slightly higher quality thanwith DERS in most tested low-texture ...
2025
In free navigation applications, any viewpoint to a three-dimensional scene can be synthesized through Depth Image-Based Rendering (DIBR). In this paper we show that XSlit cameras achieve up to 3 dB PSNR gain over conventional pinhole... more
In free navigation applications, any viewpoint to a three-dimensional scene can be synthesized through Depth Image-Based Rendering (DIBR). In this paper we show that XSlit cameras achieve up to 3 dB PSNR gain over conventional pinhole camera arrays when synthesizing virtual views to the scene with DIBR. XSlit cameras are a type of general linear cameras where the light rays pass through two nonintersecting slits instead of a single point (the optical center of conventional cameras), resulting in a different epipolar geometry and projection equations. Instead of synthesizing virtual views with DIBR from a set of conventional pinhole cameras, e.g. a stereo camera pair, a single Xslit camera exploits the distance between its slits and their relative rotation to obtain disparity, out of which DIBR virtual views can be synthesized. We first present a theoretical study to settle the equivalence between Xslit and pinhole cameras, at the same time making sure that the Xslit camera is physically implementable. We then validate the study with DIBR achieved on synthetic content using perfect depth maps obtained from an in-house modified version of Blender's engine Cycles to simulate Xslit cameras. The virtual view synthesis uses an adapted version of the Reference View Synthesis software used in MPEG, the worldwide standardization committee for media compression, recently covering also immersive video with DIBR. Finally, we also conduct comparative studies between Xslit and stereo cameras using natural content with estimated (imperfect) depth maps. Our experiments show that for the same overall space covered by the studied camera architectures, Xslit cameras often obtain better DIBR view synthesis results, with up to 3 dB PSNR gain.
2025, IEEE Transactions on Multimedia
In many advanced multimedia systems, multiview content can offer more immersion compared to classical stereoscopy. The feeling of immersiveness is increased substantially by offering motion-parallax, as well as stereopsis. This drives... more
In many advanced multimedia systems, multiview content can offer more immersion compared to classical stereoscopy. The feeling of immersiveness is increased substantially by offering motion-parallax, as well as stereopsis. This drives both the so-called free-navigation and super-multiview technologies. However, it is currently still challenging to acquire, store, process and transmit this type of content. This paper presents a novel multiview-interpolation framework for wide-baseline camera arrays. The proposed method comprises several novel components, including point cloud-based filtering, improved deghosting, multi-reference color blending, and depth-aware MRFbased disocclusion inpainting. The method offers robustness against depth errors caused by quantization and smoothing across object boundaries. Furthermore, the available input color and depth are maximally exploited while preventing propagation of unreliable information to virtual viewpoints. The experimental results show that the proposed method outperforms the stateof-the-art View Synthesis Reference Software (VSRS 4.1) both in objective terms as well as subjectively, based on a visual assessment on a high-end light-field 3D display.
2025, Imaging and Applied Optics Congress
2025, Imaging and Applied Optics Congress
2025, Computer Game Development
This chapter addresses the view synthesis of natural scenes in virtual reality (VR) using depth image-based rendering (DIBR). This method reaches photorealistic results as it directly warps photos to obtain the output, avoiding the need... more
This chapter addresses the view synthesis of natural scenes in virtual reality (VR) using depth image-based rendering (DIBR). This method reaches photorealistic results as it directly warps photos to obtain the output, avoiding the need to photograph every possible viewpoint or to make a 3D reconstruction of a scene followed by a ray-tracing rendering. An overview of the DIBR approach and frequently encountered challenges (disocclusion and ghosting artifacts, multi-view blending, handling of non-Lambertian objects) are described. Such technology finds applications in VR immersive displays and holography. Finally, a comprehensive manual of the Reference View Synthesis software (RVS), an open-source tool tested on open datasets and recognized by the MPEG-I standardization activities (where “I” refers to “immersive”) is described for hands-on practicing.
2025, Imaging and Applied Optics Congress
Depth-based view synthesis with a dozen of sparsely-spaced cameras is a viable solution for single-shot digital holography of natural scenery. We outperform shearlet approaches that require around fifty input views for reaching... more
Depth-based view synthesis with a dozen of sparsely-spaced cameras is a viable solution for single-shot digital holography of natural scenery. We outperform shearlet approaches that require around fifty input views for reaching high-quality holographic stereograms.
2025, 2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)
2025, IEEE Transactions on Multimedia
2025, 2014 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)
2025, Signal Processing: Image Communication
2025, SIGraDi 2023 Accelerated Landscapes. XXVII International Conference of the Ibero-American Society of Digital Graphics
The primary objective of this research was to explore the effectiveness of Neural Radiance Fields (NeRF) in acquiring architectural forms and compare them with traditional photogrammetry results. The study began with a comprehensive... more
The primary objective of this research was to explore the effectiveness of Neural Radiance Fields (NeRF) in acquiring architectural forms and compare them with traditional photogrammetry results. The study began with a comprehensive literature review on AI in architecture and NeRF. Afterwards, a single case study applicable to both NeRF and photogrammetry was selected for comparison. The NeRF model showed the ability to accurately represent details and light effects, adapting reflections and transparencies to real-world conditions, as well as handling occlusions, and inferring three-dimensional information. In similar situations, Photogrammetry generated less coherent volumetrics or failed to interpret objects. Additionally, tests with a reduced number of images showed that the NeRF model maintained its characteristics, while photogrammetry suffered a decrease in quality and completeness. However, NeRF's performance was influenced by data collection quality. Insufficient data led to lowerquality volumetrics with imperfections, highlighting the importance of careful data collection, even with technologies like NeRF.
2025
Moving object segmentation is an essential technique for various video surveillance applications. The result of moving object segmentation often contains shadow regions caused by the color difference of shadow pixels. Hence, moving object... more
Moving object segmentation is an essential technique for various video surveillance applications. The result of moving object segmentation often contains shadow regions caused by the color difference of shadow pixels. Hence, moving object segmentation is usually followed by a shadow elimination process to remove the false detection results. The common assumption adopted in previous works is that, under the illumination variation, the value of chromaticity components are preserved while the value of intensity component is changed. Hence, color transforms which separates luminance component and chromaticity component are usually utilized to remove shadow pixels. In this paper, various color spaces (YCbCr, HSI, normalized rgb, Yxy, Lab, c1c2c3) are examined to find the most appropriate color space for shadow elimination. So far, there have been some research efforts to compare the influence of various color spaces for shadow elimination. However, previous efforts are somewhat insufficient to compare the color distortions under illumination change in diverse color spaces, since they used a specific shadow elimination scheme or different thresholds for different color spaces. In this paper, to relieve the limitations of previous works, (1) the amount of gradients in shadow boundaries drawn to uniform colored regions are examined only for chromaticity components to compare the color distortion under illumination change and (2) the accuracy of background subtraction are analyzed via RoC curves to compare different color spaces without the problem of threshold level selection.. Through experiments on real video sequences, YCbCr and normalized rgb color spaces showed good results for shadow elimination among various color spaces used for the experiments.
2025, Pattern Recognition Letters
We propose a scheme for view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras. Under the assumption of availability of the correspondence of three vanishing points,in general position,... more
We propose a scheme for view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras. Under the assumption of availability of the correspondence of three vanishing points,in general position, our scheme computes z-buffer values that can be used for handling occlusions in the synthesized view. This requires the computation of the infinite homography. We also present an alternate formulation of the technique which works with the same assumptions but does not require infinite homography computation. We present experimental results to establish the validity of both formulations.
2025
This paper presents a method for view synthesis from multiple views and their depth maps for free navigation in Virtual Reality with six degrees of freedom (6DoF) and 360 video (3DoF+), including synthesizing views corresponding to... more
This paper presents a method for view synthesis from multiple views and their depth maps for free navigation in Virtual Reality with six degrees of freedom (6DoF) and 360 video (3DoF+), including synthesizing views corresponding to stepping in or out of the scene. Such scenarios should support large baseline view synthesis, typically going beyond the view synthesis involved in light field displays . Our method allows to input an unlimited number of reference views, instead of the usual left and right reference views. Increasing the number of reference views overcomes problems such as occlusions, tangential surfaces to the cameras axis and artifacts in low quality depth maps. We outperform MPEG's reference software, VSRS [2], with a gain of up to 2.5 dB in PSNR when using four reference views.
2025
After decennia of developing leading-edge 2D video compression technologies, MPEG is currently working on the new era of coding for Immersive applications, referred to as MPEG-I. It ranges from 360-degree video with head-mounted displays... more
After decennia of developing leading-edge 2D video compression technologies, MPEG is currently working on the new era of coding for Immersive applications, referred to as MPEG-I. It ranges from 360-degree video with head-mounted displays to free navigation in 3D space, with head-mounted and 3D light field displays. Two families of coding approaches, covering typical industrial workflows, are currently considered for standardisation – Multiview + Depth Video Coding and Point Cloud Coding – both supporting high-quality rendering at bitrates of up to a couple of hundreds of Mbps. This paper provides a technical/historical overview of the acquisition, coding and rendering technologies considered in the MPEG-I standardization activities.
2025, 2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)
This paper presents a method for view synthesis from multiple views and their depth maps for free navigation in Virtual Reality with six degrees of freedom (6DoF) and 360 video (3DoF+), including synthesizing views corresponding to... more
This paper presents a method for view synthesis from multiple views and their depth maps for free navigation in Virtual Reality with six degrees of freedom (6DoF) and 360 video (3DoF+), including synthesizing views corresponding to stepping in or out of the scene. Such scenarios should support large baseline view synthesis, typically going beyond the view synthesis involved in light field displays . Our method allows to input an unlimited number of reference views, instead of the usual left and right reference views. Increasing the number of reference views overcomes problems such as occlusions, tangential surfaces to the cameras axis and artifacts in low quality depth maps. We outperform MPEG's reference software, VSRS [2], with a gain of up to 2.5 dB in PSNR when using four reference views.
2025
Video representations that support view synthesis based on depth maps, such as multiview plus depth, have been widely emerged raising interest in efficient depth maps coding tools. In this paper, we propose an innovative sparse... more
Video representations that support view synthesis based on depth maps, such as multiview plus depth, have been widely emerged raising interest in efficient depth maps coding tools. In this paper, we propose an innovative sparse decomposition on wavelets based dictionary specially designed for the piece-wise planar nature of depth signal. We also evaluate performances of the proposed dictionary for depth maps coding while paying special attention to the impact of depth coding errors on resulting synthesized images. Obtained results prove the relevance of the proposed scheme able to considerably improve the perceived quality of synthesized images.
2025, ELCVIA Electronic Letters on Computer Vision and Image Analysis
In recent years, Multi-view Video plus Depth (MVD) compression has received much attention thanks to its relevance to free viewpoint applications needs. An efficient compression, that causes the least distortion without excessive rate and... more
In recent years, Multi-view Video plus Depth (MVD) compression has received much attention thanks to its relevance to free viewpoint applications needs. An efficient compression, that causes the least distortion without excessive rate and complexity increase, becomes a must particularly for depth maps. These latter can be compressed efficiently by the 3D extension of High Efficiency Video Coding (3D-HEVC), which has explored wedgelets. Such functions lead to significant Rate-Distortion tradeoffs. However, they require a very large computational complexity involved by the exhaustive search used for the estimation of the wedgelet subdivision line. In this paper, we propose a rapid localization of this latter using an edge detection approach. The experimental results show that the proposed approach allows an important gain in terms of encoding delay, while providing competitive depth maps and synthesized views quality compared to the exhaustive search approach.
2025, 3D Research
The multi-view video plus depth (MVD) video format consists of two components: texture and depth map, where a combination of these components enables a receiver to generate arbitrary virtual views. However, MVD presents a very voluminous... more
The multi-view video plus depth (MVD) video format consists of two components: texture and depth map, where a combination of these components enables a receiver to generate arbitrary virtual views. However, MVD presents a very voluminous video format that requires a compression process for storage and especially for transmission. Conventional codecs are perfectly efficient for texture images compression but not for intrinsic depth maps properties. Depth images indeed are characterized by areas of smoothly varying grey levels separated by sharp discontinuities at the position of object boundaries. Preserving these characteristics is important to enable high quality view synthesis at the receiver side. In this paper, sparse representation of depth maps is discussed. It is shown that a significant gain in sparsity is achieved when particular mixed dictionaries are used for approximating these types of images with greedy selection strategies. Experiments are conducted to confirm the effectiveness at producing sparse representations, and competitiveness, with respect to candidate state-of-art dictionaries. Finally, the resulting method is shown to be effective for depth maps compression and represents an advantage over the ongoing 3D high efficiency video coding compression standard, particularly at medium and high bitrates.
2025, Radioengineering
This article presents a new concept of using the auto-focus function of the monoscopic camera sensor to estimate depth map information, which avoids not only using auxiliary equipment or human interaction, but also the introduced... more
This article presents a new concept of using the auto-focus function of the monoscopic camera sensor to estimate depth map information, which avoids not only using auxiliary equipment or human interaction, but also the introduced computational complexity of SfM or depth analysis. The system architecture that supports both stereo image and video data capturing, processing and display is discussed. A novel stereo image pair generation algorithm by using Z-buffer-based 3D surface recovery is proposed. Based on the depth map, we are able to calculate the dis-parity map (the distance in pixels between the image points in both views) for the image. The presented algorithm uses a single image with depth information (e.g. z-buffer) as an input and produces two images for left and right eye.
2025, Proceedings of the 1st international workshop on 3D video processing
Novel view synthesis methods consist in using several images or video sequences of the same scene, and creating new images of this scene, as if they were taken by a camera placed at a different viewpoint. They can be used in stereoscopic... more
Novel view synthesis methods consist in using several images or video sequences of the same scene, and creating new images of this scene, as if they were taken by a camera placed at a different viewpoint. They can be used in stereoscopic cinema to change the camera parameters (baseline, vergence, focal length...) a posteriori, or to adapt a stereoscopic broadcast that was shot for given viewing conditions (such as a movie theater) to a different screen size and distance (such as a 3DTV in a living room) . View synthesis from stereoscopic movies usually proceeds in two phases [11]: First, disparity maps and other viewpoint-independent data (such as scene layers and matting information) are extracted from the original sequences, and second, this data and the original images are used to synthesize the new sequence, given geometric information about the synthesized viewpoints. Unfortunately, since no known stereo method gives perfect results in all situations, the results of the first phase will most probably contain errors, which will result in 2D or 3D artifacts in the synthesized stereoscopic movie. We propose to add a third phase where these artifacts are detected and removed is each stereoscopic image pair, while keeping the perceived quality of the stereoscopic movie close to the original.
2025, arXiv (Cornell University)
Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information... more
Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications.
2025, arXiv (Cornell University)
2025, Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)
and efficient monitoring of dynamically changing environments is one of the most important requirements for visual surveillance systems. This paper describes development of a ubiquitous vision system for this monitoring purpose. The... more
and efficient monitoring of dynamically changing environments is one of the most important requirements for visual surveillance systems. This paper describes development of a ubiquitous vision system for this monitoring purpose. The system consisting of multiple omnidirectional vision sensors is developed to address two specific surveillance tasks: (1) Robust and accurate tracking and profiling of human activities, (2) Dynamic synthesis of virtual views for observing the environment from arbitrary vantage points.
2025
In recent years, Neural Radiation Fields and 3D Gaussian Splatting have attracted significant attention within the field of 3D graphics computing. This article introduces and compares these emerging visualization techniques with... more
In recent years, Neural Radiation Fields and 3D Gaussian Splatting have attracted significant attention within the field of 3D graphics computing. This article introduces and compares these emerging visualization techniques with traditional 3D photogrammetric modeling. As the technology continues to evolve, particularly in the case of 3D Gaussian Splatting, this overview represents only a snapshot of the current state of the art. Meanwhile, researchers in 3D computer graphics are actively exploring new methods beyond those traditionally associated with polygon-based 3D visualization.
2024, Citeseer
The goal of the TEMICS project-team is the design and development of algorithms and practical solutions in the areas of analysis, modelling, coding, communication and watermarking of images and video signals.The TEMICS project-team... more
The goal of the TEMICS project-team is the design and development of algorithms and practical solutions in the areas of analysis, modelling, coding, communication and watermarking of images and video signals.The TEMICS project-team activities are structured and organized around the following research directions : • 3D modelling and representations of multi-view video sequences.
2024
Depth mapping has become an integral tool in a wide range of industries, from autonomous systems to immersive 3D experiences. This paper highlights the necessity of achieving perfect depth mapping to support and enhance the next... more
Depth mapping has become an integral tool in a wide range of industries, from autonomous systems to immersive 3D experiences. This paper highlights the necessity of achieving perfect depth mapping to support and enhance the next generation of systems, including integration with the .dotmx file format. By addressing challenges in resolution, dynamic range, and data fidelity, the study proposes a framework for refining depth maps through advanced neural modeling and empirical methodologies. Theoretical underpinnings and empirical evaluations illustrate how precise depth maps can improve outcomes in fields like 3D modeling, robotics, and media production. The paper concludes with a roadmap for incorporating scalable, modular depth mapping frameworks into evolving technological ecosystems.
2024, arXiv (Cornell University)
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. Volumetric approaches provide a solution for modeling occlusions through the explicit 3D representation of the... more
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. Volumetric approaches provide a solution for modeling occlusions through the explicit 3D representation of the camera frustum. Multi-plane Images (MPI) are volumetric methods that represent the scene using front-parallel planes at distinct depths but suffer from depth discretization leading to a 2.D scene representation. Another line of approach relies on implicit 3D scene representations. Neural Radiance Fields (NeRF) utilize neural networks for encapsulating the continuous 3D scene structure within the network weights achieving photorealistic synthesis results, however, methods are constrained to perscene optimization settings which are inefficient in practice. Multi-plane Neural Radiance Fields (MINE) open the door for combining implicit and explicit scene representations. It enables continuous 3D scene representations, especially in the depth dimension, while utilizing the input image features to avoid perscene optimization. The main drawback of the current literature work in this domain is being constrained to single-view input, limiting the synthesis ability to narrow viewpoint ranges. In this work, we thoroughly examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields. In addition, we propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range. Features from the input source frames are effectively fused through a proposed attention-aware fusion module to highlight important information from different viewpoints. Experiments show the effectiveness of attention-based fusion and the promising outcomes of our proposed method when compared to multi-view NeRF and MPI techniques.