Yixin Zhuang - Academia.edu (original) (raw)

Papers by Yixin Zhuang

Research paper thumbnail of Visual Localization via Few-Shot Scene Region Classification

Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera ... more Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes.

Research paper thumbnail of MDISN: Learning multiscale deformed implicit fields from single images

Research paper thumbnail of Decoupling Features and Coordinates for Few-shot RGB Relocalization

ArXiv, 2019

Cross-scene model adaption is a crucial feature for camera relocalization applied in real scenari... more Cross-scene model adaption is a crucial feature for camera relocalization applied in real scenarios. It is preferable that a pre-learned model can be quickly deployed in a novel scene with as little training as possible. The existing state-of-the-art approaches, however, can hardly support few-shot scene adaption due to the entangling of image feature extraction and 3D coordinate regression, which requires a large-scale of training data. To address this issue, inspired by how humans relocalize, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression and pose estimation are performed separately. Our key insight is that robust and discriminative image features used for coordinate regression should be learned by removing the distracting factor of camera views, because coordinates in the world reference frame are obviously independent of local views. In particular, we employ a deep neural network to learn view-factorized pixel-wise fea...

Research paper thumbnail of Multimodal Shape Completion via Conditional Generative Adversarial Networks

Computer Vision – ECCV 2020, 2020

Several deep learning methods have been proposed for completing partial data from shape acquisiti... more Several deep learning methods have been proposed for completing partial data from shape acquisition setups, i.e., filling the regions that were missing in the shape. These methods, however, only complete the partial shape with a single output, ignoring the ambiguity when reasoning the missing geometry. Hence, we pose a multi-modal shape completion problem, in which we seek to complete the partial shape with multiple outputs by learning a one-to-many mapping. We develop the first multimodal shape completion method that completes the partial shape via conditional generative modeling, without requiring paired training data. Our approach distills the ambiguity by conditioning the completion on a learned multimodal distribution of possible results. We extensively evaluate the approach on several datasets that contain varying forms of shape incompleteness, and compare among several baseline methods and variants of our methods qualitatively and quantitatively, demonstrating the merit of our method in completing partial shapes with both diversity and quality.

Research paper thumbnail of PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

The characterization of object perception provided by recognition-by-components (RBC) bears a clo... more The characterization of object perception provided by recognition-by-components (RBC) bears a close resemblance to some current views as to how speech is perceived."-Irving Biederman [5]

Research paper thumbnail of Deformation-driven topology-varying 3D shape correspondence

ACM Transactions on Graphics, 2015

We present a deformation-driven approach to topology-varying 3D shape correspondence. In this par... more We present a deformation-driven approach to topology-varying 3D shape correspondence. In this paradigm, the best correspondence between two shapes is the one that results in a minimal-energy, possibly topology-varying, deformation that transforms one shape to conform to the other while respecting the correspondence. Our deformation model, called GeoTopo transform, allows both geometric and topological operations such as part split, duplication, and merging, leading to fine-grained and piecewise continuous correspondence results. The key ingredient of our correspondence scheme is a deformation energy that penalizes geometric distortion, encourages structure preservation, and simultaneously allows topology changes. This is accomplished by connecting shape parts using structural rods, which behave similarly to virtual springs but simultaneously allow the encoding of energies arising from geometric, structural, and topological shape variations. Driven by the combined deformation energy, an optimal shape correspondence is obtained via a pruned beam search. We demonstrate our deformationdriven correspondence scheme on extensive sets of man-made models with rich geometric and topological variation and compare the results to state-of-the-art approaches.

Research paper thumbnail of Neural Implicit 3D Shapes from Single Images with Spatial Patterns

arXiv: Computer Vision and Pattern Recognition, Jun 6, 2021

3D shape reconstruction from a single image has been a longstanding problem in computer vision. R... more 3D shape reconstruction from a single image has been a longstanding problem in computer vision. Recent advances have led to 3D representation learning, wherein pixel-aligned 3D reconstruction methods show impressive performance. However, it is normally hard to exploit meaningful local image features to describe 3D point samplings from the aligned pixels when large variations of occlusions, views, and appearances exist. In this paper, we study a general kernel to encode local image features with considering geometric relationships of point samplings from the underlying surfaces. The kernel is derived from the proposed spatial pattern, in a way the kernel points are obtained as the 2D projections of a number of 3D pattern points around a sampling. Supported by the spatial pattern, the 2D kernel encodes geometric information that is essential for 3D reconstruction tasks, while traditional 2D kernels mainly consider appearance information. Furthermore, to enable the network to discover more adaptive spatial patterns for further capturing non-local contextual information, the spatial pattern is devised to be deformable. Experimental results on both synthetic datasets and real datasets demonstrate the superiority of the proposed method. Pre-trained models, codes, and data are available at https://github.com/yixin26/SVR-SP.

Research paper thumbnail of Anisotropic geodesics for live-wire mesh segmentation

Computer Graphics Forum, 2014

ABSTRACT We present an interactive method for mesh segmentation that is inspired by the classical... more ABSTRACT We present an interactive method for mesh segmentation that is inspired by the classical live-wire interaction for image segmentation. The core contribution of the work is the definition and computation of wires on surfaces that are likely to lie at segment boundaries. We define wires as geodesics in a new tensor-based anisotropic metric, which improves upon previous metrics in stability and feature-awareness. We further introduce a simple but effective mesh embedding approach that allows geodesic paths in an anisotropic path to be computed efficiently using existing algorithms designed for Euclidean geodesics. Our tool is particularly suited for delineating segmentation boundaries that are aligned with features or curvature directions, and we demonstrate its use in creating artist-guided segmentations.

Research paper thumbnail of A general and efficient method for finding cycles in 3D curve networks

ACM Transactions on Graphics, 2013

ABSTRACT Generating surfaces from 3D curve networks has been a longstanding problem in computer g... more ABSTRACT Generating surfaces from 3D curve networks has been a longstanding problem in computer graphics. Recent attention to this area has resurfaced as a result of new sketch based modeling systems. In this work we present a new algorithm for finding cycles that bound surface patches. Unlike prior art in this area, the output of our technique is unrestricted, generating both manifold and non-manifold geometry with arbitrary genus. The novel insight behind our method is to formulate our problem as finding local mappings at the vertices and curves of our network, where each mapping describes how incident curves are grouped into cycles. This approach lends us the efficiency necessary to present our system in an interactive design modeler, whereby the user can adjust patch constraints and change the manifold properties of curves while the system automatically re-optimizes the solution.

Research paper thumbnail of Deformation-Driven Topology-Varying 3D Shape Correspondence

We present a deformation-driven approach to topology-varying 3D shape correspondence. In this par... more We present a deformation-driven approach to topology-varying 3D shape correspondence. In this paradigm, the best correspondence between two shapes is the one that results in a minimal-energy, possibly topology-varying, deformation that transforms one shape to conform to the other while respecting the correspondence. Our deformation model, called GeoTopo transform, allows both geometric and topological operations such as part split, duplication, and merging, leading to fine-grained and piecewise continuous correspondence results. The key ingredient of our correspondence scheme is a deformation energy that penalizes geometric distortion, encourages structure preservation, and simultaneously allows topology changes. This is accomplished by connecting shape parts using structural rods, which behave similarly to virtual springs but simultaneously allow the encoding of energies arising from geometric, structural, and topological shape variations. Driven by the combined deformation energy, an optimal shape correspondence is obtained via a pruned beam search. We demonstrate our deformation-driven correspondence scheme on extensive sets of man-made models with rich geometric and topological variation and compare the results to state-of-the-art approaches.

Research paper thumbnail of Visual Localization via Few-Shot Scene Region Classification

Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera ... more Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes.

Research paper thumbnail of MDISN: Learning multiscale deformed implicit fields from single images

Research paper thumbnail of Decoupling Features and Coordinates for Few-shot RGB Relocalization

ArXiv, 2019

Cross-scene model adaption is a crucial feature for camera relocalization applied in real scenari... more Cross-scene model adaption is a crucial feature for camera relocalization applied in real scenarios. It is preferable that a pre-learned model can be quickly deployed in a novel scene with as little training as possible. The existing state-of-the-art approaches, however, can hardly support few-shot scene adaption due to the entangling of image feature extraction and 3D coordinate regression, which requires a large-scale of training data. To address this issue, inspired by how humans relocalize, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression and pose estimation are performed separately. Our key insight is that robust and discriminative image features used for coordinate regression should be learned by removing the distracting factor of camera views, because coordinates in the world reference frame are obviously independent of local views. In particular, we employ a deep neural network to learn view-factorized pixel-wise fea...

Research paper thumbnail of Multimodal Shape Completion via Conditional Generative Adversarial Networks

Computer Vision – ECCV 2020, 2020

Several deep learning methods have been proposed for completing partial data from shape acquisiti... more Several deep learning methods have been proposed for completing partial data from shape acquisition setups, i.e., filling the regions that were missing in the shape. These methods, however, only complete the partial shape with a single output, ignoring the ambiguity when reasoning the missing geometry. Hence, we pose a multi-modal shape completion problem, in which we seek to complete the partial shape with multiple outputs by learning a one-to-many mapping. We develop the first multimodal shape completion method that completes the partial shape via conditional generative modeling, without requiring paired training data. Our approach distills the ambiguity by conditioning the completion on a learned multimodal distribution of possible results. We extensively evaluate the approach on several datasets that contain varying forms of shape incompleteness, and compare among several baseline methods and variants of our methods qualitatively and quantitatively, demonstrating the merit of our method in completing partial shapes with both diversity and quality.

Research paper thumbnail of PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

The characterization of object perception provided by recognition-by-components (RBC) bears a clo... more The characterization of object perception provided by recognition-by-components (RBC) bears a close resemblance to some current views as to how speech is perceived."-Irving Biederman [5]

Research paper thumbnail of Deformation-driven topology-varying 3D shape correspondence

ACM Transactions on Graphics, 2015

We present a deformation-driven approach to topology-varying 3D shape correspondence. In this par... more We present a deformation-driven approach to topology-varying 3D shape correspondence. In this paradigm, the best correspondence between two shapes is the one that results in a minimal-energy, possibly topology-varying, deformation that transforms one shape to conform to the other while respecting the correspondence. Our deformation model, called GeoTopo transform, allows both geometric and topological operations such as part split, duplication, and merging, leading to fine-grained and piecewise continuous correspondence results. The key ingredient of our correspondence scheme is a deformation energy that penalizes geometric distortion, encourages structure preservation, and simultaneously allows topology changes. This is accomplished by connecting shape parts using structural rods, which behave similarly to virtual springs but simultaneously allow the encoding of energies arising from geometric, structural, and topological shape variations. Driven by the combined deformation energy, an optimal shape correspondence is obtained via a pruned beam search. We demonstrate our deformationdriven correspondence scheme on extensive sets of man-made models with rich geometric and topological variation and compare the results to state-of-the-art approaches.

Research paper thumbnail of Neural Implicit 3D Shapes from Single Images with Spatial Patterns

arXiv: Computer Vision and Pattern Recognition, Jun 6, 2021

3D shape reconstruction from a single image has been a longstanding problem in computer vision. R... more 3D shape reconstruction from a single image has been a longstanding problem in computer vision. Recent advances have led to 3D representation learning, wherein pixel-aligned 3D reconstruction methods show impressive performance. However, it is normally hard to exploit meaningful local image features to describe 3D point samplings from the aligned pixels when large variations of occlusions, views, and appearances exist. In this paper, we study a general kernel to encode local image features with considering geometric relationships of point samplings from the underlying surfaces. The kernel is derived from the proposed spatial pattern, in a way the kernel points are obtained as the 2D projections of a number of 3D pattern points around a sampling. Supported by the spatial pattern, the 2D kernel encodes geometric information that is essential for 3D reconstruction tasks, while traditional 2D kernels mainly consider appearance information. Furthermore, to enable the network to discover more adaptive spatial patterns for further capturing non-local contextual information, the spatial pattern is devised to be deformable. Experimental results on both synthetic datasets and real datasets demonstrate the superiority of the proposed method. Pre-trained models, codes, and data are available at https://github.com/yixin26/SVR-SP.

Research paper thumbnail of Anisotropic geodesics for live-wire mesh segmentation

Computer Graphics Forum, 2014

ABSTRACT We present an interactive method for mesh segmentation that is inspired by the classical... more ABSTRACT We present an interactive method for mesh segmentation that is inspired by the classical live-wire interaction for image segmentation. The core contribution of the work is the definition and computation of wires on surfaces that are likely to lie at segment boundaries. We define wires as geodesics in a new tensor-based anisotropic metric, which improves upon previous metrics in stability and feature-awareness. We further introduce a simple but effective mesh embedding approach that allows geodesic paths in an anisotropic path to be computed efficiently using existing algorithms designed for Euclidean geodesics. Our tool is particularly suited for delineating segmentation boundaries that are aligned with features or curvature directions, and we demonstrate its use in creating artist-guided segmentations.

Research paper thumbnail of A general and efficient method for finding cycles in 3D curve networks

ACM Transactions on Graphics, 2013

ABSTRACT Generating surfaces from 3D curve networks has been a longstanding problem in computer g... more ABSTRACT Generating surfaces from 3D curve networks has been a longstanding problem in computer graphics. Recent attention to this area has resurfaced as a result of new sketch based modeling systems. In this work we present a new algorithm for finding cycles that bound surface patches. Unlike prior art in this area, the output of our technique is unrestricted, generating both manifold and non-manifold geometry with arbitrary genus. The novel insight behind our method is to formulate our problem as finding local mappings at the vertices and curves of our network, where each mapping describes how incident curves are grouped into cycles. This approach lends us the efficiency necessary to present our system in an interactive design modeler, whereby the user can adjust patch constraints and change the manifold properties of curves while the system automatically re-optimizes the solution.

Research paper thumbnail of Deformation-Driven Topology-Varying 3D Shape Correspondence

We present a deformation-driven approach to topology-varying 3D shape correspondence. In this par... more We present a deformation-driven approach to topology-varying 3D shape correspondence. In this paradigm, the best correspondence between two shapes is the one that results in a minimal-energy, possibly topology-varying, deformation that transforms one shape to conform to the other while respecting the correspondence. Our deformation model, called GeoTopo transform, allows both geometric and topological operations such as part split, duplication, and merging, leading to fine-grained and piecewise continuous correspondence results. The key ingredient of our correspondence scheme is a deformation energy that penalizes geometric distortion, encourages structure preservation, and simultaneously allows topology changes. This is accomplished by connecting shape parts using structural rods, which behave similarly to virtual springs but simultaneously allow the encoding of energies arising from geometric, structural, and topological shape variations. Driven by the combined deformation energy, an optimal shape correspondence is obtained via a pruned beam search. We demonstrate our deformation-driven correspondence scheme on extensive sets of man-made models with rich geometric and topological variation and compare the results to state-of-the-art approaches.