Unsupervised learning of complex articulated kinematic structures combining motion and skeleton information (original) (raw)
Related papers
Highly Articulated Kinematic Structure Estimation combining Motion and Skeleton Information
IEEE Transactions on Pattern Analysis and Machine Intelligence
In this paper, we present a novel framework for unsupervised kinematic structure learning of complex articulated objects from a single-view 2D image sequence. In contrast to prior motion-based methods, which estimate relatively simple articulations, our method can generate arbitrarily complex kinematic structures with skeletal topology via a successive iterative merging strategy. The iterative merge process is guided by a density weighted skeleton map which is generated from a novel object boundary generation method from sparse 2D feature points. Our main contributions can be summarised as follows: (i) An unsupervised complex articulated kinematic structure estimation method that combines motion segments with skeleton information. (ii) An iterative fine-to-coarse merging strategy for adaptive motion segmentation and structural topology embedding. (iii) A skeleton estimation method based on a novel silhouette boundary generation from sparse feature points using an adaptive model selection method. (iv) A new highly articulated object dataset with ground truth annotation. We have verified the effectiveness of our proposed method in terms of computational time and estimation accuracy through rigorous experiments with multiple datasets. Our experiments show that the proposed method outperforms state-of-the-art methods both quantitatively and qualitatively.
Image and Vision Computing, 2004
Though a large body of research has focused on tracking and identifying objects from the domain of colour or grey-scale images, there is a relative dearth in the literature on complex articulated/non-rigid motion reconstruction from a collection of low density feature points. In this paper, we propose a segment-based articulated matching algorithm to establish a crucial self-initialising identification in model-based point-feature tracking of articulated motion with near-rigid segments. We avoid common assumptions such as pose similarity or small motion with respect to the model, and assume no prior knowledge of a specific movement from which to restrict pose identification. Experimental results based on synthetic pose and real-world human motion capture data demonstrate the ability of the algorithm to perform the identification task. q
Recovering Articulated Object Models from 3D Range Data
2004
We address the problem of unsupervised learning of complex articulated object models from 3D range data. We describe an algorithm whose input is a set of meshes corresponding to different configurations of an articulated object. The algorithm automatically recovers a decomposition of the object into approximately rigid parts, the location of the parts in the different object instances, and the articulated object skeleton linking the parts. Our algorithm first registers all the meshes using an unsupervised non-rigid technique described in a companion paper. It then segments the meshes using a graphical model that captures the spatial contiguity of parts. The segmentation is done using the EM algorithm, iterating between finding a decomposition of the object into rigid parts, and finding the location of the parts in the object instances. Although the graphical model is densely connected, the object decomposition step can be performed optimally and efficiently, allowing us to identify a large number of object parts while avoiding local maxima. We demonstrate the algorithm on real world datasets, recovering a 15-part articulated model of a human puppet from just 7 different puppet configurations, as well as a 4 part model of a flexing arm where significant non-rigid deformation was present.
Towards Understanding Articulated Objects
Robots operating in home environments must be able to interact with articulated objects such as doors or drawers. Ideally, robots are able to autonomously infer articulation models by observation. In this paper, we present an approach to learn kinematic models by inferring the connectivity of rigid parts and the articulation models for the corresponding links. Our method uses a mixture of parameterized and parameter-free representations. To obtain parameter-free models, we seek for low-dimensional manifolds of latent action variables in order to provide the best explanation of the given observations. The mapping from the constrained manifold of an articulated link to the work space is learned by means of Gaussian process regression. Our approach has been implemented and evaluated using real data obtained in various home environment settings. Finally, we discuss the limitations and possible extensions of the proposed method.
Learning kinematic models for articulated objects
2009
Robots operating in home environments must be able to interact with articulated objects such as doors or drawers. Ideally, robots are able to autonomously infer articulation models by observation. In this paper, we present an approach to learn kinematic models by inferring the connectivity of rigid parts and the articulation models for the corresponding links. Our method uses a mixture of parameterized and parameter-free (Gaussian process) representations and finds low-dimensional manifolds that provide the best explanation of the given observations. Our approach has been implemented and evaluated using real data obtained in various realistic home environment settings.
Articulated motion reconstruction from feature points
Pattern Recognition, 2008
A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem of feature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we propose a general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solely by sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame tracking to achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic keyframe-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed to enhance the segment-based spatial matching. Such a spatial-temporal approach ultimately reduces the ambiguity of identification inherent in a single frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for a common articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, in buffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involving self-resuming tracking for articulated motion. essential or intermediate correspondence towards the endproduct of motion and structure recovery .
Category-Level Articulated Object Pose Estimation
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
This paper addresses the task of category-level pose estimation for articulated objects from a single depth image. We present a novel category-level approach that correctly accommodates object instances previously unseen during training. We introduce Articulation-aware Normalized Coordinate Space Hierarchy (ANCSH)-a canonical representation for different articulated objects in a given category. As the key to achieve intra-category generalization, the representation constructs a canonical object space as well as a set of canonical part spaces. The canonical object space normalizes the object orientation, scales and articulations (e.g. joint parameters and states) while each canonical part space further normalizes its part pose and scale. We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space. By leveraging the canonicalized joints, we demonstrate: 1) improved performance in part pose and scale estimations using the induced kinematic constraints from joints; 2) high accuracy for joint parameter estimation in camera space.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008
Recovering articulated shape and motion, especially human body motion, from video is a challenging problem with a wide range of applications in medical study, sport analysis, animation, and so forth. Previous work on articulated motion recovery generally requires prior knowledge of the kinematic chain and usually does not concern the recovery of the articulated shape. The nonrigidity of some articulated part, for example, human body motion with nonrigid facial motion, is completely ignored. We propose a factorizationbased approach to recover the shape, motion, and kinematic chain of an articulated object with nonrigid parts altogether directly from video sequences under a unified framework. The proposed approach is based on our modeling of the articulated nonrigid motion as a set of intersecting motion subspaces. A motion subspace is the linear subspace of the trajectories of an object. It can model a rigid or nonrigid motion. The intersection of two motion subspaces of linked parts models the motion of an articulated joint or axis. Our approach consists of algorithms for motion segmentation, kinematic chain building, and shape recovery. It handles outliers and can be automated. We test our approach through synthetic and real experiments and demonstrate how to recover an articulated structure with nonrigid parts via a single-view camera without prior knowledge of its kinematic chain.
Combined discriminative and generative articulated pose and non-rigid shape estimation
2007
Estimation of three-dimensional articulated human pose and motion from images is a central problem in computer vision. Much of the previous work has been limited by the use of crude generative models of humans represented as articulated collections of simple parts such as cylinders. Automatic initialization of such models has proved difficult and most approaches assume that the size and shape of the body parts are known a priori. In this paper we propose a method for automatically recovering a detailed parametric model of non-rigid body shape and pose from monocular imagery. Specifically, we represent the body using a parameterized triangulated mesh model that is learned from a database of human range scans. We demonstrate a discriminative method to directly recover the model parameters from monocular images using a conditional mixture of kernel regressors. This predicted pose and shape are used to initialize a generative model for more detailed pose and shape estimation. The resulting approach allows fully automatic pose and shape recovery from monocular and multi-camera imagery. Experimental results show that our method is capable of robustly recovering articulated pose, shape and biometric measurements (e.g. height, weight, etc.) in both calibrated and uncalibrated camera environments.