Observable subspaces for 3D human motion recovery (original) (raw)

Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions

2005

A learning based framework is proposed for estimating human body pose from a single image. Given a differentiable function that maps from pose space to image feature space, the goal is to invert the process: estimate the pose given only image features. The inversion is an ill-posed problem as the inverse mapping is a one to many process, hence multiple solutions exist. It is desirable to restrict the solution space to a smaller subset of feasible solutions. The space of feasible solutions may not admit a closed form description. The proposed framework seeks to learn an approximation over such a space. Using Gaussian Process Latent Variable Modelling. The scaled conjugate gradient method is used to find the best matching pose in the learned space. The formulation allows easy incorporation of various constraints for more accurate pose estimation. The performance of the proposed approach is evaluated in the task of upper-body pose estimation from silhouettes and compared with the Specialized Mapping Architecture. The proposed approach performs better than the latter approach in terms of estimation accuracy with synthetic data and qualitatively better results with real video of humans performing gestures.

Monocular Human Motion Capture with a Mixture of Regressors

2005

We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible pose hypotheses. It uses locality on a manifold in the input space and connectivity in the output space to identify regions of multi-valuedness in the mapping from silhouette to 3D pose. This information is used to fit a mixture of regressors on the input manifold, giving us a global model capable of predicting the possible poses with corresponding probabilities. These are then used in a dynamicalmodel based tracker that automatically detects tracking failures and re-initializes in a probabilistically correct manner. The system is trained on conventional motion capture data, using both the corresponding real human silhouettes and silhouettes synthesized artificially from several different models for improved robustness to inter-person variations. Static pose estimation is illustrated on a variety of silhouettes. The robustness of the method is demonstrated by tracking on a real image sequence requiring multiple automatic re-initializations.

Dynamical binary latent variable models for 3D human pose tracking

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010

We introduce a new class of probabilistic latent variable model called the Implicit Mixture of Conditional Restricted Boltzmann Machines (imCRBM) for use in human pose tracking. Key properties of the imCRBM are as follows: (1) learning is linear in the number of training exemplars so it can be learned from large datasets; (2) it learns coherent models of multiple activities; (3) it automatically discovers atomic "movemes"; and (4) it can infer transitions between activities, even when such transitions are not present in the training set. We describe the model and how it is learned and we demonstrate its use in the context of Bayesian filtering for multi-view and monocular pose tracking. The model handles difficult scenarios including multiple activities and transitions among activities. We report state-of-the-art results on the HumanEva dataset.

Representing motion as a sequence of latent primitives, a flexible approach for human motion modelling

Cornell University - arXiv, 2022

We propose a new representation of human body motion which encodes a full motion in a sequence of latent motion primitives. Recently, task generic motion priors have been introduced and propose a coherent representation of human motion based on a single latent code, with encouraging results for many tasks. Extending these methods to longer motion with various duration and framerate is all but straightforward as one latent code proves inefficient to encode longer term variability. Our hypothesis is that long motions are better represented as a succession of actions than in a single block. By leveraging a sequence-to-sequence architecture, we propose a model that simultaneously learns a temporal segmentation of motion and a prior on the motion segments. To provide flexibility with temporal resolution and motion duration, our representation is continuous in time and can be queried for any timestamp. We show experimentally that our method leads to a significant improvement over state-of-the-art motion priors on a spatio-temporal completion task on sparse pointclouds. Code will be made available upon publication.

Human Motion Reconstruction and Synthesis of Human Skills

Advances in Robot Kinematics: Motion in Man and Machine, 2010

Reconstructing human motion dynamics in real-time is a challenging problem since it requires accurate motion sensing, subject specific models, and efficient reconstruction algorithms. A promising approach is to construct accurate human models, and control them to behave the same way the subject does. Here, we demonstrate that the whole-body control approach can efficiently reconstruct a subject's motion dynamics in real world task-space when given a scaled model and marker based motion capture data. We scaled a biomechanically realistic musculoskeletal model to a subject, captured motion with suitably placed markers, and used an operational space controller to directly track the motion of the markers with the model. Our controller tracked the positions, velocities, and accelerations of many markers in parallel by assigning them to tasks with different priority levels based on how free their parent limbs were. We executed lower priority marker tracking tasks in the successive null spaces of the higher priority tasks to resolve their interdependencies. The controller accurately reproduced the subject's full body dynamics while executing a throwing motion in near real time. Its reconstruction closely matched the marker data, and its performance was consistent for the entire motion. Our findings suggest that the direct marker tracking approach is an attractive tool to reconstruct and synthesize the dynamic motion of humans and other complex articulated body systems in a computationally efficient manner.

Learning to reconstruct 3D human motion from Bayesian mixtures of experts. A probabilistic discriminative approach

2004

We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences, based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture database and a 3D computer graphics human model, to synthesize training pairs of typical human configurations, together with their realistically rendered 2D silhouettes. These are used to directly learn the conditional state distributions required for 3D body pose tracking, and thus avoid using the 3D model for inference (the learned distributions obtained using a discriminative approach can also be used, complementary, as importance samplers, in order to improve mixing or initialize generative inference algorithms). We aim for probabilistically motivated tracking algorithms and for models that can estimate complex multivalued mappings common in inverse, uncertain perception inferences. Our paper has three contributions: (1) we clarify the assumptions and derive the density propagation rules for discriminative inference in continuous, temporal chain models; (2) we propose flexible representations and algorithms for learning multimodal conditional state distributions, based on compact Bayesian mixture of experts models; and (3) we demonstrate our algorithms by presenting empirical results on real and motion capture-based test sequences and by comparing against nearest-neighbor and regression methods.

Human Motion Reconstruction by Direct Control of Marker Trajectories

Advances in Robot Kinematics: Analysis and Design, 2008

Understanding the basis of human movement and reproducing it in robotic environments is a compelling challenge that has engaged a multidisciplinary audience. In addressing this challenge, an important initial step involves reconstructing motion from experimental motion capture data. To this end we propose a new algorithm to reconstruct human motion from motion capture data through direct control of captured marker trajectories. This algorithm is based on a task/posture decomposition and prioritized control approach. This approach ensures smooth tracking of desired marker trajectories as well as the extraction of joint angles in real-time without the need for inverse kinematics. It also provides flexibility over traditional inverse kinematic approaches. Our algorithm was validated on a sequence of tai chi motions. The results demonstrate the efficacy of the direct marker control approach for motion reconstruction from experimental marker data.

Inferring 3D body pose from silhouettes using activity manifold learning

2004

We aim to infer 3D body pose directly from human silhouettes. Given a visual input (silhouette), the objective is to recover the intrinsic body configuration, recover the view point, reconstruct the input and detect any spatial or temporal outliers. In order to recover intrinsic body configuration (pose) from the visual input (silhouette), we explicitly learn view-based representations of activity manifolds as well as learn mapping functions between such central representations and both the visual input space and the 3D body pose space. The body pose can be recovered in a closed form in two steps by projecting the visual input to the learned representations of the activity manifold, i.e., finding the point on the learned manifold representation corresponding to the visual input, followed by interpolating 3D pose.

Latent structured models for human pose estimation

2011 International Conference on Computer Vision, 2011

We present an approach for automatic 3D human pose reconstruction from monocular images, based on a discriminative formulation with latent segmentation inputs. We advance the field of structured prediction and human pose reconstruction on several fronts. First, by working with a pool of figure-ground segment hypotheses, the prediction problem is formulated in terms of combined learning and inference over segment hypotheses and 3D human articular configurations. Besides constructing tractable formulations for the combined segment selection and pose estimation problem, we propose new augmented kernels that can better encode complex dependencies between output variables. Furthermore, we provide primal linear re-formulations based on Fourier kernel approximations, in order to scale-up the non-linear latent structured prediction methodology. The proposed models are shown to be competitive in the Hu-manEva benchmark and are also illustrated in a clip collected from a Hollywood movie, where the model can infer human poses from monocular images captured in complex environments.

Observable subspaces for 3D human motion recovery (original) (raw)

Related papers