HoloGAN: Unsupervised Learning of 3D Representations From Natural Images (original) (raw)
Related papers
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation
International Journal of Computer Vision, 2020
We infer and generate three-dimensional (3D) scene information from a single input image and without supervision. This problem is under-explored, with most prior work relying on supervision from, e.g., 3D groundtruth, multiple images of a scene, image silhouettes or key-points. We propose Pix2Shape, an approach to solve this problem with four component: (i) an encoder that infers the latent 3D representation from an image, (ii) a decoder that generates an explicit 2.5D surfelbased reconstruction of a scene-from the latent code-(iii) a differentiable renderer that synthesizes a 2D image from the surfel representation, and (iv) a critic network trained to discriminate between images generated by the decoder-renderer and those from a training distribution. Pix2Shape can generate complex 3D scenes that scale with the view-dependent on-screen resolution, unlike representations that capture world-space resolution, i.e., voxels or meshes. We show that Pix2Shape learns a consistent scene representation in its encoded latent space, and that the decoder can then be applied to this latent representation in order to synthesize the scene from a novel viewpoint. We evaluate Pix2Shape with experiments on the ShapeNet dataset as well as on a novel benchmark we developed-called 3D-IQTTto evaluate models based on their ability to enable 3d spatial reasoning. Qualitative and quantitative evaluation demonstrate Pix2Shape's ability to solve scene reconstruction, generation and understanding tasks.
Unsupervised Learning of Dense Shape Correspondence
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Dense correspondence between articulated objects obtained with the proposed unsupervised loss, optimized on a single (unlabeled) example. Our method is compared with the state-of-the-art supervised network pre-trained on human shapes, as well as with two axiomatic methods, employing a post processing algorithm [49] on the axiomatic results. See Section 5.1 for more details.
Holographic Neural Architectures
ArXiv, 2018
Representation learning is at the heart of what makes deep learning effective. In this work, we introduce a new framework for representation learning that we call "Holographic Neural Architectures" (HNAs). In the same way that an observer can experience the 3D structure of a holographed object by looking at its hologram from several angles, HNAs derive Holographic Representations from the training set. These representations can then be explored by moving along a continuous bounded single dimension. We show that HNAs can be used to make generative networks, state-of-the-art regression models and that they are inherently highly resistant to noise. Finally, we argue that because of their denoising abilities and their capacity to generalize well from very few examples, models based upon HNAs are particularly well suited for biological applications where training examples are rare or noisy.
Self-supervised Learning of Dense Shape Correspondence
arXiv (Cornell University), 2018
We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it by a purely geometric criterion. The resulting learning model is class-agnostic, and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize on the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.
Unite the People: Closing the Loop Between 3D and 2D Human Representations
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits "in-thewild". However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads to an initial dataset, UP-3D, with rich annotations. With a comprehensive set of experiments, we show how this data can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body. Using the 91 landmark pose estimator, we present state-ofthe art results for 3D human pose and shape estimation using an order of magnitude less training data and without assumptions about gender or pose in the fitting procedure. We show that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale. The data, code and models are available for research purposes. * This work was performed while J. Romero and F. Bogo were with the MPI-IS 2 ; P. V. Gehler with the BCCN 1 and MPI-IS 2 .
Look, Evolve and Mold: Learning 3D Shape Manifold via Single-view Synthetic Data
2021
With daily observation and prior knowledge, it is easy for us human to infer the stereo structure via a single view. However, to equip the deep models with such ability usually requires abundant supervision. It is promising that without the elaborated 3D annotation, we can simply profit from the synthetic data, where pairwise ground-truth is easy to access. Nevertheless, the domain gap is not neglectable considering the variant texture, shape and context. To overcome these difficulties, we propose a domain-adaptive network for single-view 3D reconstruction, dubbed LEM, to generalize towards the natural scenario by fulfilling several aspects: (1) Look: incorporating spatial structure from the single view to enhance the representation; (2) Evolve: leveraging the semantic information with unsupervised contrastive mapping recurring to the shape priors; (3) Mold: transforming into the desired stereo manifold with discernment and semantic knowledge. Extensive experiments on several benchm...
Semi-Supervised Learning of Multi-Object 3D Scene Representations
2021
Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose a novel approach for learning multiobject 3D scene representations from images. A recurrent encoder regresses a latent representation of 3D shapes, poses and texture of each object from an input RGB image. The 3D shapes are represented continuously in function-space as signed distance functions (SDF) which we efficiently pre-train from example shapes in a supervised way. By differentiable rendering we then train our model to decompose scenes self-supervised from RGB-D images. Our approach learns to decompose images into the constituent objects of the scene and to infer their shape, pose and texture from a single view. We evaluate the accuracy of our model in inferring the 3D scene layout and demonstrate its generative capabilities.
2022
Unambiguous provision of results in different environments and conditions by machine learning algorithms is an unresolved problem until now. Solving the problem of machine learning with unambiguous provision of results in different environments and conditions can be approached by focusing on the psychophysical holographic process of human learning. A person, with a mental concentration of attention, experimentally teaches vision, hearing, psyche and mind in a holographic way and in a resonant way to perceive, recognize and recognize phenomena, processes, objects, subjects, meanings, music and other entities in various environments and conditions. A person experimentally teaches the psyche and feelings to rationally navigate in various environments and conditions. Holographic algorithms of experienced machine learning will help neural network ensembles to unambiguously recognize objects, subjects, music, texts in various environments and conditions using a model of recognizing their own or someone else's. Machine learning simulates holographic processes of human communication memorization of entities. Searching for objects in different environments in different conditions based on experienced machine learning simulates resonant associative processes of human entity detection. By simulating holographic processes of the human psyche based on artificial intelligence of machine learning with Fourier transformation, using full parametric sequences of necessary and sufficient data of holograms of target objects, it is possible to solve the problem of their unambiguous detection in different environments and in different conditions.
Neural networks for object orientation extraction in 3D
2021
Internal representations of objects and environments in animals are necessary to perform tasks such as navigation and semantic information extraction from visual scenes. Recent developments in hardware, data storage and computational power have given way to the deployment of Deep Learning statistical models to emulate cognition. Studying neural representations of 3D space in biological organisms can offer insight into new architectures and approaches for visual tasks, such as the extraction of object rotations in 3D through the design of artificial networks that can comprehend and reproduce them. During this internship we employed state of the art supervised learning approaches to extract orientation information from visual scenes and compared diverse architectures of deep artificial neural networks and mathematical representations of rotations -such as rotation matrices, axis-angle and Von Mises representations- in terms of their performance. After creating a very large dataset consisting of images of tetris-like objects with distinct random geometry and their rotation in regards to a fixed reference point, we have managed to train a Convolutional Neural Network algorithm that is capable of calculating relative rotations of an object in a virtual environment when presented with a set of images depicting orientations of said object, with an approximate mean error of 11.31 degrees. This work is part of a greater effort to map neural visual activities and replicate them while accounting for a problem complexity that approximates that of real life systems. Generative models will also be used in the future to generate data from internal representations coupled with unsupervised learning approaches to visual datasets.