LASR: Learning Articulated Shape Reconstruction from a Monocular Video (original) (raw)

Many existing approaches on nonrigid shape reconstruction heavily rely on category-specific 3D shape templates, such as SMPL for human and SMAL for quadrupeds. In contrast, LASR jointly recovers the object shape, articulation and camera parameters from a monocular video without using category-specific shape templates. By combining generic shape and motion priors with differentiable rendering, LASR applies to a wide range of nonrigid shapes and obtains faithfull 3D reconstruciotn.

Abstract

Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images. However, it is still challenging to reconstruct nonrigid structures from RGB inputs, due to the under-constrained nature of this problem. While template-based approaches, such as parametric shape models, have achieved great success in terms of modeling the ``closed world" of known object categories, their ability to handle the ``open-world" of novel object categories and outlier shapes is still limited. In this work, we introduce a template-free approach for 3D shape learning from a single video. It adopts an analysis-by-synthesis strategy that forward-renders object silhouette, optical flow, and pixels intensities to compare against video observations, which generates gradients signals to adjust the camera, shape and motion parameters. Without relying on a category-specific shape template, our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes in the wild.

Bibtex

@inproceedings{yang2021lasr, title={LASR: Learning Articulated Shape Reconstruction from a Monocular Video}, author={Yang, Gengshan and Sun, Deqing and Jampani, Varun and Vlasic, Daniel and Cole, Forrester and Chang, Huiwen and Ramanan, Deva and Freeman, William T and Liu, Ce}, booktitle={CVPR}, year={2021} }

Acknowledgments

This work was partially done during internship at Google. Thanks to Xueting Li, Nilesh Kulkarni and Benjamin Biggs for providing pre-trained models/implementations, Tyler Zhu for providing detailed feedback to the manuscript, Angjoo Kanazawa, Tali Dekel and Zhoutong Zhang for valuable suggestions.