Yan-Pei Cao - Academia.edu (original) (raw)

Papers by Yan-Pei Cao

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strate... more Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clouds will degrade generation of high-quality 3D shapes, as detailed topology and structure of unordered points are hard to be captured during the generative process using an extracted latent code. We address this problem by formulating completion as point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net++, to mimic behavior of an earth mover. It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest. Therefore, PMP-Net++ predicts unique PMP for each point according to constraint of point moving distances. The network learns a strict and unique correspondence on point-level, and thus improves quality of predicted complete shape. Moreover, since moving points heavily relies on per-point features learned by network, we further introduce a transformer-enhanced representation learning network, which significantly improves completion performance of PMP-Net++. We conduct comprehensive experiments in shape completion, and further explore application on point cloud up-sampling, which demonstrate non-trivial improvement of PMP-Net++ over state-of-the-art point cloud completion/up-sampling methods.

ArXiv, 2022

Most existing point cloud completion methods suffered from discrete nature of point clouds and un... more Most existing point cloud completion methods suffered from discrete nature of point clouds and unstructured prediction of points in local regions, which makes it hard to reveal fine local geometric details. To resolve this issue, we propose SnowflakeNet with Snowflake Point Deconvolution (SPD) to generate the complete point clouds. SPD models the generation of complete point clouds as the snowflake-like growth of points, where the child points are progressively generated by splitting their parent points after each SPD. Our insight of revealing detailed geometry is to introduce skip-transformer in SPD to learn point splitting patterns which can fit local regions the best. Skip-transformer leverages attention mechanism to summarize the splitting patterns used in previous SPD layer to produce the splitting in current SPD layer. The locally compact and structured point clouds generated by SPD precisely reveal the structure characteristic of 3D shape in local patches, which enables us to...

2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Input (b) TopNet (c) CDN (d) NSFA (e) SnowflakeNet (f) GT

Computer Vision – ECCV 2018, 2018

We present a data-driven approach to reconstructing highresolution and detailed volumetric repres... more We present a data-driven approach to reconstructing highresolution and detailed volumetric representations of 3D shapes. Although well studied, algorithms for volumetric fusion from multi-view depth scans are still prone to scanning noise and occlusions, making it hard to obtain high-fidelity 3D reconstructions. In this paper, inspired by recent advances in efficient 3D deep learning techniques, we introduce a novel cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations from noisy and incomplete depth maps in a progressive, coarse-to-fine manner. To this end, we also develop an algorithm for end-to-end training of the proposed cascaded structure. Qualitative and quantitative experimental results on both simulated and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work in terms of quality and fidelity of reconstructed models.

We introduce DoubleField, a novel representation combining the merits of both surface field and r... more We introduce DoubleField, a novel representation combining the merits of both surface field and radiance field for high-fidelity human rendering. Within DoubleField, the surface field and radiance field are associated together by a shared feature embedding and a surface-guided sampling strategy. In this way, DoubleField has a continuous but disentangled learning space for geometry and appearance modeling, which supports fast training, inference, and finetuning. To achieve high-fidelity free-viewpoint rendering, DoubleField is further augmented to leverage ultra-highresolution inputs, where a view-to-view transformer and a transfer learning scheme are introduced for more efficient learning and finetuning from sparse-view inputs at original resolutions. The efficacy of DoubleField is validated by the quantitative evaluations on several datasets and the qualitative results in a real-world sparse multi-view system, showing its superior capability for photo-realistic freeviewpoint human ...

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

In this paper, we present a novel unpaired point cloud completion network, named Cycle4Completion... more In this paper, we present a novel unpaired point cloud completion network, named Cycle4Completion, to infer the complete geometries from a partial 3D object. Previous unpaired completion methods merely focus on the learning of geometric correspondence from incomplete shapes to complete shapes, and ignore the learning in the reverse direction, which makes them suffer from low completion accuracy due to the limited 3D shape understanding ability. To address this problem, we propose two simultaneous cycle transformations between the latent spaces of complete shapes and incomplete ones. Specifically, the first cycle transforms shapes from incomplete domain to complete domain, and then projects them back to the incomplete domain. This process learns the geometric characteristic of complete shapes, and maintains the shape consistency between the complete prediction and the incomplete input. Similarly, the inverse cycle transformation starts from complete domain to incomplete domain, and goes back to complete domain to learn the characteristic of incomplete shapes. We experimentally show that our model with the learned bidirectional geometry correspondence outperforms state-of-the-art unpaired completion methods. Code will be available at https:// github.com/ diviswen/Cycle4Completion.

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

The task of point cloud completion aims to predict the missing part for an incomplete 3D shape. A... more The task of point cloud completion aims to predict the missing part for an incomplete 3D shape. A widely used strategy is to generate a complete point cloud from the incomplete one. However, the unordered nature of point clouds will degrade the generation of high-quality 3D shapes, as the detailed topology and structure of discrete points are hard to be captured by the generative process only using a latent code. In this paper, we address the above problem by reconsidering the completion task from a new perspective, where we formulate the prediction as a point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net, to mimic the behavior of an earth mover. It moves each point of the incomplete input to complete the point cloud, where the total distance of point moving paths (PMP) should be shortest. Therefore, PMP-Net predicts a unique point moving path for each point according to the constraint of total point moving distances. As a result, the network learns a strict and unique correspondence on point-level, and thus improves the quality of the predicted complete shape. We conduct comprehensive experiments on Completion3D and PCN datasets, which demonstrate our advantages over the state-of-the-art point cloud completion methods. Code will be available at https://github.com/diviswen/PMP-Net.

2019 International Conference on Robotics and Automation (ICRA), 2019

We present a real-time dense mapping system which uses the predicted 2D semantic labels for optim... more We present a real-time dense mapping system which uses the predicted 2D semantic labels for optimizing the geometric quality of reconstruction. With a combination of Convolutional Neural Networks (CNNs) for 2D labeling and a Simultaneous Localization and Mapping (SLAM) system for camera trajectory estimation, recent approaches have succeeded in incrementally fusing and labeling 3D scenes. However, the geometric quality of the reconstruction can be further improved by incorporating such semantic prediction results, which is not sufficiently exploited by existing methods. In this paper, we propose to use semantic information to improve two crucial modules in the reconstruction pipeline, namely tracking and loop detection, for obtaining mutual benefits in geometric reconstruction and semantic recognition. Specifically for tracking, we use a novel probabilistic projective association approach to efficiently pick out candidate correspondences, where the confidence of these correspondences is quantified concerning similarities on all available short-term invariant features. For the loop detection, we incorporate these semantic labels into the original encoding through Randomized Ferns to generate a more comprehensive representation for retrieving candidate loop frames. Evaluations on a publicly available synthetic dataset have shown the effectiveness of our approach that considers such semantic hints as a reliable feature for achieving higher geometric quality.

Computational Visual Media, 2021

Reconstructing dynamic scenes with commodity depth cameras has many applications in computer grap... more Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics, computer vision, and robotics. However, due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information, traditional approaches often produce low-quality geometry with holes, bumps, and misalignments. We propose a novel 3D dynamic reconstruction system, named HDR-Net-Fusion, which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels, using a hierarchical deep reinforcement (HDR) network. The latter comprises two parts: a global HDR-Net which rapidly detects local regions with large geometric errors, and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions. Training the global HDR-Net is formulated as a novel reinforcement learning problem to implici...

ACM Transactions on Graphics, 2020

We present a two-stage approach to first constructing 3D panoramas and then stitching them for no... more We present a two-stage approach to first constructing 3D panoramas and then stitching them for noise-resilient reconstruction of large-scale indoor scenes. Our approach requires multiple unsynchronized RGB-D cameras, mounted on a robot platform, which can perform in-place rotations at different locations in a scene. Such cameras rotate on a common (but unknown) axis, which provides a novel perspective for coping with unsynchronized cameras, without requiring sufficient overlap of their Field-of-View (FoV). Based on this key observation, we propose novel algorithms to track these cameras simultaneously. Furthermore, during the integration of raw frames onto an equirectangular panorama, we derive uncertainty estimates from multiple measurements assigned to the same pixels. This enables us to appropriately model the sensing noise and consider its influence, so as to achieve better noise resilience, and improve the geometric quality of each panorama and the accuracy of global inter-pano...

IEEE Transactions on Visualization and Computer Graphics, 2019

We present a learning-based approach to reconstructing high-resolution three-dimensional (3D) sha... more We present a learning-based approach to reconstructing high-resolution three-dimensional (3D) shapes with detailed geometry and high-fidelity textures. Albeit extensively studied, algorithms for 3D reconstruction from multi-view depth-and-color (RGB-D) scans are still prone to measurement noise and occlusions; limited scanning or capturing angles also often lead to incomplete reconstructions. Propelled by recent advances in 3D deep learning techniques, in this paper, we introduce a novel computation-and memory-efficient cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations as well as the corresponding color information from noisy and imperfect RGB-D maps. The proposed 3D neural network performs reconstruction in a progressive and coarse-to-fine manner, achieving unprecedented output resolution and fidelity. Meanwhile, an algorithm for end-to-end training of the proposed cascaded structure is developed. We further introduce Human10, a newly created dataset containing both detailed and textured full-body reconstructions as well as corresponding raw RGB-D scans of 10 subjects. Qualitative and quantitative experimental results on both synthetic and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work regarding visual quality and accuracy of reconstructed models.

ACM Transactions on Graphics, 2018

We present an integrated approach for reconstructing high-fidelity three-dimensional (3D) models ... more We present an integrated approach for reconstructing high-fidelity three-dimensional (3D) models using consumer RGB-D cameras. RGB-D registration and reconstruction algorithms are prone to errors from scanning noise, making it hard to perform 3D reconstruction accurately. The key idea of our method is to assign a probabilistic uncertainty model to each depth measurement, which then guides the scan alignment and depth fusion. This allows us to effectively handle inherent noise and distortion in depth maps while keeping the overall scan registration procedure under the iterative closest point framework for simplicity and efficiency. We further introduce a local-to-global, submap-based, and uncertainty-aware global pose optimization scheme to improve scalability and guarantee global model consistency. Finally, we have implemented the proposed algorithm on the GPU, achieving real-time 3D scanning frame rates and updating the reconstructed model on-the-fly. Experimental results on simula...

Computer Graphics Forum, 2016

Sharp edges are important shape features and their extraction has been extensively studied both o... more Sharp edges are important shape features and their extraction has been extensively studied both on point clouds and surfaces. We consider the problem of extracting sharp edges from a sparse set of colour-and-depth (RGB-D) images. The noise-ridden depth measurements are challenging for existing feature extraction methods that work solely in the geometric domain (e.g. points or meshes). By utilizing both colour and depth information, we propose a novel feature extraction method that produces much cleaner and more coherent feature lines. We make two technical contributions. First, we show that intensity edges can augment the depth map to improve normal estimation and feature localization from a single RGB-D image. Second, we designed a novel algorithm for consolidating feature points obtained from multiple RGB-D images. By utilizing normals and ridge/valley types associated with the feature points, our algorithm is effective in suppressing noise without smearing nearby features.

Computer Graphics Forum, 2014

A recent trend in interactive modeling of 3D shapes from a single image is designing minimal inte... more A recent trend in interactive modeling of 3D shapes from a single image is designing minimal interfaces, and accompanying algorithms, for modeling a specific class of objects. Expanding upon the range of shapes that existing minimal interfaces can model, we present an interactive image-guided tool for modeling shapes made up of extruded parts. An extruded part is represented by extruding a closed planar curve, called base, in the direction orthogonal to the base. To model each extruded part, the user only needs to sketch the projected base shape in the image. The main technical contribution is a novel optimization-based approach for recovering the 3D normal of the base of an extruded object by exploring both geometric regularity of the sketched curve and image contents. We developed a convenient interface for modeling multi-part shapes and a method for optimizing the relative placement of the parts. Our tool is validated using synthetic data and tested on real-world images.

IEEE Transactions on Visualization and Computer Graphics, 2015

With broader availability of large-scale 3D model repositories, the need for efficient and effect... more With broader availability of large-scale 3D model repositories, the need for efficient and effective exploration becomes more and more urgent. Existing model retrieval techniques do not scale well with the size of the database since often a large number of very similar objects are returned for a query, and the possibilities to refine the search are quite limited. We propose an interactive approach where the user feeds an active learning procedure by labeling either entire models or parts of them as "like" or "dislike" such that the system can automatically update an active set of recommended models. To provide an intuitive user interface, candidate models are presented based on their estimated relevance for the current query. From the methodological point of view, our main contribution is to exploit not only the similarity between a query and the database models but also the similarities among the database models themselves. We achieve this by an offline pre-processing stage, where global and local shape descriptors are computed for each model and a sparse distance metric is derived that can be evaluated efficiently even for very large databases. We demonstrate the effectiveness of our method by interactively exploring a repository containing over 100K models.

Algorithms for rendering interreflection (or indirect illumination) effects often make assumption... more Algorithms for rendering interreflection (or indirect illumination) effects often make assumptions about the frequency range of the materials' reflectance properties. For example, methods based on Virtual Point Lights (VPLs) and global photon maps perform well for diffuse and semi-glossy materials but not so for highly-glossy or specular materials; the situation is reversed for methods based on ray tracing and caustics photon maps. In this paper, we present a practical algorithm for rendering interreflection effects at all frequency scales. Our method builds upon a Spherical Gaussian representation of the BRDF. Our main contribution is a novel mathematical development of the interreflection equation. This allows us to efficiently compute the one-bounce interreflection from a triangle to a shading point through an analytic formula combined with a piecewise linear approximation. We show through evaluation that this method is accurate for a wide range of BRDF parameters. Our second contribution is a hierarchical integration method to handle a large number of triangles with bounded error. Finally, we have implemented the algorithm on the GPU, achieving nearinteractive rendering speed for a variety of scenes.

ACM Transactions on Graphics, 2014

Algorithms for rendering interreflection (or indirect illumination) effects often make assumption... more Algorithms for rendering interreflection (or indirect illumination) effects often make assumptions about the frequency range of the materials' reflectance properties. For example, methods based on Virtual Point Lights (VPLs) perform well for diffuse and semi-glossy materials but not so for highly glossy or specular materials; the situation is reversed for methods based on ray tracing. In this article, we present a practical algorithm for rendering interreflection effects with all-frequency BRDFs. Our method builds upon a spherical Gaussian representation of the BRDF, based on which a novel mathematical development of the interreflection equation is made. This allows us to efficiently compute one-bounce interreflection from a triangle to a shading point, by using an analytic formula combined with a piecewise linear approximation. We show through evaluation that this method is accurate for a wide range of BRDFs. We further introduce a hierarchical integration method to handle comp...

Computer Graphics Forum, 2021

3D animation production for storytelling requires essential manual processes of virtual scene com... more 3D animation production for storytelling requires essential manual processes of virtual scene composition, character creation, and motion editing, etc. Although professional artists can favorably create 3D animations using software, it remains a complex and challenging task for novice users to handle and learn such tools for content creation. In this paper, we present Write‐An‐Animation, a 3D animation system that allows novice users to create, edit, preview, and render animations, all through text editing. Based on the input texts describing virtual scenes and human motions in natural languages, our system first parses the texts as semantic scene graphs, then retrieves 3D object models for virtual scene composition and motion clips for character animation. Character motion is synthesized with the combination of generative locomotions using neural state machine as well as template action motions retrieved from the dataset. Moreover, to make the virtual scene layout compatible with c...

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

ArXiv, 2022

2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Input (b) TopNet (c) CDN (d) NSFA (e) SnowflakeNet (f) GT

Computer Vision – ECCV 2018, 2018

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

2019 International Conference on Robotics and Automation (ICRA), 2019

Computational Visual Media, 2021

ACM Transactions on Graphics, 2020

IEEE Transactions on Visualization and Computer Graphics, 2019

ACM Transactions on Graphics, 2018

Computer Graphics Forum, 2016

Computer Graphics Forum, 2014

IEEE Transactions on Visualization and Computer Graphics, 2015

ACM Transactions on Graphics, 2014

Algorithms for rendering interreflection (or indirect illumination) effects often make assumption... more Algorithms for rendering interreflection (or indirect illumination) effects often make assumptions about the frequency range of the materials' reflectance properties. For example, methods based on Virtual Point Lights (VPLs) perform well for diffuse and semi-glossy materials but not so for highly glossy or specular materials; the situation is reversed for methods based on ray tracing. In this article, we present a practical algorithm for rendering interreflection effects with all-frequency BRDFs. Our method builds upon a spherical Gaussian representation of the BRDF, based on which a novel mathematical development of the interreflection equation is made. This allows us to efficiently compute one-bounce interreflection from a triangle to a shading point, by using an analytic formula combined with a piecewise linear approximation. We show through evaluation that this method is accurate for a wide range of BRDFs. We further introduce a hierarchical integration method to handle comp...

Computer Graphics Forum, 2021