Domain-Adaptive Single-View 3D Reconstruction (original) (raw)

Learning Single-View 3D Reconstruction with Adversarial Training

ArXiv, 2018

Single-view 3D shape reconstruction is an important but challenging problem, mainly for two reasons. First, as shape annotation is very expensive to acquire, current methods rely on synthetic data, in which ground-truth 3D annotation is easy to obtain. However, this results in domain adaptation problem when applied to natural images. The second challenge is that it exists multiple shapes that can explain a given 2D image. In this paper, we propose a framework to improve over these challenges using adversarial training. On one hand, we impose domain-confusion between natural and synthetic image representations to reduce the distribution gap. On the other hand, we impose the reconstruction to be `realistic' by forcing it to lie on a (learned) manifold of realistic object shapes. Moreover, our experiments show that these constraints improve performance by a large margin over a baseline reconstruction model. We achieve results competitive with the state of the art using only RGB ima...

Look, Evolve and Mold: Learning 3D Shape Manifold via Single-view Synthetic Data

2021

With daily observation and prior knowledge, it is easy for us human to infer the stereo structure via a single view. However, to equip the deep models with such ability usually requires abundant supervision. It is promising that without the elaborated 3D annotation, we can simply profit from the synthetic data, where pairwise ground-truth is easy to access. Nevertheless, the domain gap is not neglectable considering the variant texture, shape and context. To overcome these difficulties, we propose a domain-adaptive network for single-view 3D reconstruction, dubbed LEM, to generalize towards the natural scenario by fulfilling several aspects: (1) Look: incorporating spatial structure from the single view to enhance the representation; (2) Evolve: leveraging the semantic information with unsupervised contrastive mapping recurring to the shape priors; (3) Mold: transforming into the desired stereo manifold with discernment and semantic knowledge. Extensive experiments on several benchm...

Efficient Geometry-aware 3D Generative Adversarial Networks

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Unsupervised generation of high-quality multi-viewconsistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. For this purpose, we introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.

IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019

The majority of the existing methods for non-rigid 3D surface regression from a single 2D image require an object template or point tracks over multiple frames as an input, and are still far from real-time processing rates. In this work, we present the Isometry-Aware Monocular Generative Adversarial Network (IsMo-GAN)-an approach for direct 3D reconstruction from a single image, trained for the deformation model in an adversarial manner on a lightweight synthetic dataset. IsMo-GAN reconstructs surfaces from real images under varying illumination, camera poses, textures and shading at over 250 Hz. In multiple experiments, it consistently outperforms multiple approaches in the reconstruction accuracy, runtime, generalisation to unknown surfaces and robustness to occlusions. In comparison to the state-of-the-art, we reduce the reconstruction error by 10-30% including the textureless case and our surfaces evince fewer artefacts qualitatively.

Human Mesh Reconstruction with Generative Adversarial Networks from Single RGB Images

Sensors, 2021

Applications related to smart cities require virtual cities in the experimental development stage. To build a virtual city that are close to a real city, a large number of various types of human models need to be created. To reduce the cost of acquiring models, this paper proposes a method to reconstruct 3D human meshes from single images captured using a normal camera. It presents a method for reconstructing the complete mesh of the human body from a single RGB image and a generative adversarial network consisting of a newly designed shape–pose-based generator (based on deep convolutional neural networks) and an enhanced multi-source discriminator. Using a machine learning approach, the reliance on multiple sensors is reduced and 3D human meshes can be recovered using a single camera, thereby reducing the cost of building smart cities. The proposed method achieves an accuracy of 92.1% in body shape recovery; it can also process 34 images per second. The method proposed in this pape...

Shape Reconstruction by Learning Differentiable Surface Representations

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Generative models that produce point clouds have emerged as a powerful tool to represent 3D surfaces, and the best current ones rely on learning an ensemble of parametric representations. Unfortunately, they offer no control over the deformations of the surface patches that form the ensemble and thus fail to prevent them from either overlapping or collapsing into single points or lines. As a consequence, computing shape properties such as surface normals and curvatures becomes difficult and unreliable. In this paper, we show that we can exploit the inherent differentiability of deep networks to leverage differential surface properties during training so as to prevent patch collapse and strongly reduce patch overlap. Furthermore, this lets us reliably compute quantities such as surface normals and curvatures. We will demonstrate on several tasks that this yields more accurate surface reconstructions than the state-of-the-art methods in terms of normals estimation and amount of collapsed and overlapped patches.

D-OccNet: Detailed 3D Reconstruction Using Cross-Domain Learning

ArXiv, 2021

Deep learning based 3D reconstruction of single view 2D image is becoming increasingly popular due to their wide range of real-world applications, but this task is inherently challenging because of the partial observability of an object from a single perspective. Recently, state of the art probabilitybased Occupancy Networks reconstructed 3D surfaces from three different types of input domains: single view 2D image, point cloud and voxel. In this study, we extend the work on Occupancy Networks by exploiting cross-domain learning of image and point cloud domains. Specifically, we first convert the single view 2D image into a simpler point cloud representation, and then reconstruct a 3D surface from it. Our network, the Double Occupancy Network (D-OccNet) outperforms Occupancy Networks in terms of visual quality and details captured in the 3D reconstruction.

Adversarial Parametric Pose Prior

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

The Skinned Multi-Person Linear (SMPL) model can represent a human body by mapping pose and shape parameters to body meshes. This has been shown to facilitate inferring 3D human pose and shape from images via different learning models. However, not all pose and shape parameter values yield physically-plausible or even realistic body meshes. In other words, SMPL is under-constrained and may thus lead to invalid results when used to reconstruct humans from images, either by directly optimizing its parameters, or by learning a mapping from the image to these parameters. In this paper, we therefore learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training. We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images. We found that the prior based on spherical distribution gets the best results. Furthermore, in all these tasks, it outperforms the state-of-the-art VAE-based approach to constraining the SMPL parameters.

3D Topology Transformation with Generative Adversarial Networks

arXiv (Cornell University), 2020

Our proposed 3D Topology Transformation method disentangles the concepts of structural shape and volumetric topology style of objects in 3D space. The Generator of our Vox2Vox is able to transform the 3D input models (on the top) to novel 3D representations (on the bottom), while retaining the original overall structure.

3D Reconstruction of Novel Object Shapes from Single Images

2021 International Conference on 3D Vision (3DV), 2021

The key challenge in single image 3D shape reconstruction is to ensure that deep models can generalize to shapes which were not part of the training set. This is difficult because the algorithm must infer the occluded portion of the surface by leveraging the shape characteristics of the training data, and can therefore be vulnerable to overfitting. Such generalization to unseen categories of objects is a function of architecture design and training approaches. This paper introduces SDFNet, a novel shape prediction architecture and training approach which supports effective generalization. We provide an extensive investigation of the factors which influence generalization accuracy and its measurement, ranging from the consistent use of 3D shape metrics to the choice of rendering approach and the large-scale evaluation on unseen shapes using ShapeNet-Core.v2 and ABC. We show that SDFNet provides state-of-the-art performance on seen and unseen shapes relative to existing baseline methods GenRe[49] and OccNet[27]. We provide the first large-scale experimental evaluation of generalization performance. The codebase released with this article will allow for the consistent evaluation and comparison of methods for single image shape reconstruction. 1