Unsupervised Learning of Dense Shape Correspondence (original) (raw)

Self-supervised Learning of Dense Shape Correspondence

arXiv (Cornell University), 2018

We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it by a purely geometric criterion. The resulting learning model is class-agnostic, and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize on the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.

Dense Human Body Correspondences Using Convolutional Networks

Figure 1: We introduce a deep learning framework for computing dense correspondences between human shapes in arbitrary, complex poses, and wearing varying clothing. Our approach can handle full 3D models as well as partial scans generated from a single depth map. The source and target shapes do not need to be the same subject, as highlighted in the left pair. Abstract We propose a deep learning approach for finding dense correspondences between 3D scans of people. Our method requires only partial geometric information in the form of two depth maps or partial reconstructed surfaces, works for humans in arbitrary poses and wearing any clothing, does not require the two people to be scanned from similar viewpoints , and runs in real time. We use a deep convolutional neural network to train a feature descriptor on depth map pixels, but crucially, rather than training the network to solve the shape correspondence problem directly, we train it to solve a body region classification problem, modified to increase the smoothness of the learned descriptors near region boundaries. This approach ensures that nearby points on the human body are nearby in feature space, and vice versa, rendering the feature descriptor suitable for computing dense correspondences between the scans. We validate our method on real and synthetic data for both clothed and unclothed humans, and show that our correspondences are more robust than is possible with state-of-the-art unsuper-vised methods, and more accurate than those found using methods that require full watertight 3D geometry.

Deep Functional Maps: Structured Prediction for Dense Shape Correspondence

We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifying a point on some reference domain; the correspondence is then constructed a posteriori by composing the label predictions of two input shapes. We propose a paradigm shift and design a structured prediction model in the space of functional maps, linear operators that provide a compact representation of the correspondence. We model the learning process via a deep residual network which takes dense descriptor fields defined on two shapes as input, and outputs a soft map between the two given objects. The resulting correspondence is shown to be accurate on several challenging benchmarks comprising multiple categories, synthetic models, real scans with acquisition artifacts, topological noise, and partiality.

Representation and matching of articulated shapes

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.

We consider the problem of localizing the articulated and deformable shape of a walking person in a single view. We represent the non-rigid 2D body contour by a Bayesian graphical model whose nodes correspond to point positions along the contour. The deformability of the model is constrained by learned priors corresponding to two basic mechanisms: local non-rigid deformation, and rotation motion of the joints. Four types of image cues are combined to relate the model configuration to the observed image, including edge gradient map, foreground/background mask, skin color mask, and appearance consistency constraints. The constructed Bayes network is sparse and chain-like, enabling efficient spatial inference through Sequential Monte Carlo sampling methods. We evaluate the performance of the model on images taken in cluttered, outdoor scenes. The utility of each image cue is also empirically explored.

Learning Dense Correspondence from Synthetic Environments

arXiv (Cornell University), 2022

Estimation of human shape and pose from a single image is a challenging task. It is an even more difficult problem to map the identified human shape onto a 3D human model. Existing methods map manually labelled human pixels in real 2D images onto the 3D surface, which is prone to human error, and the sparsity of available annotated data often leads to sub-optimal results. We propose to solve the problem of data scarcity by training 2D-3D human mapping algorithms using automatically generated synthetic data for which exact and dense 2D-3D correspondence is known. Such a learning strategy using synthetic environments has a high generalisation potential towards real-world data. Using different camera parameter variations, background and lighting settings, we created precise ground truth data that constitutes a wider distribution. We evaluate the performance of models trained on synthetic using the COCO dataset and validation framework. Results show that training 2D-3D mapping network models on synthetic data is a viable alternative to using real data.

CorrNet3D: Unsupervised End-to-end Learning of Dense Correspondence for 3D Point Clouds

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Motivated by the intuition that one can transform two aligned point clouds to each other more easily and meaningfully than a misaligned pair, we propose CorrNet3Dthe first unsupervised and end-to-end deep learning-based framework-to drive the learning of dense correspondence between 3D shapes by means of deformation-like reconstruction to overcome the need for annotated data. Specifically, CorrNet3D consists of a deep feature embedding module and two novel modules called correspondence indicator and symmetric deformer. Feeding a pair of raw point clouds, our model first learns the pointwise features and passes them into the indicator to generate a learnable correspondence matrix used to permute the input pair. The symmetric deformer, with an additional regularized loss, transforms the two permuted point clouds to each other to drive the unsupervised learning of the correspondence. The extensive experiments on both synthetic and real-world datasets of rigid and non-rigid 3D shapes show our CorrNet3D outperforms state-of-the-art methods to a large extent, including those taking meshes as input. CorrNet3D is a flexible framework in that it can be easily adapted to supervised learning if annotated data are available. The source code and pre-trained model will be available at https://github.com/ZENGYIMING-EAMON/CorrNet3D.git.

Unsupervised learning of complex articulated kinematic structures combining motion and skeleton information

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

In this paper we present a novel framework for unsupervised kinematic structure learning of complex articulated objects from a single-view image sequence. In contrast to prior motion information based methods, which estimate relatively simple articulations, our method can generate arbitrarily complex kinematic structures with skeletal topology by a successive iterative merge process. The iterative merge process is guided by a skeleton distance function which is generated from a novel object boundary generation method from sparse points. Our main contributions can be summarised as follows: (i) Unsupervised complex articulated kinematic structure learning by combining motion and skeleton information. (ii) Iterative fine-to-coarse merging strategy for adaptive motion segmentation and structure smoothing. (iii) Skeleton estimation from sparse feature points. (iv) A new highly articulated object dataset containing multi-stage complexity with ground truth. Our experiments show that the proposed method out-performs stateof-the-art methods both quantitatively and qualitatively.