Félix G . Harvey | École Polytechnique de Montréal (original) (raw)

Drafts by Félix G . Harvey

Manually authoring transition animations for a complete locomotion system can be a tedious and ti... more Manually authoring transition animations for a complete locomotion system can be a tedious and time-consuming task, especially for large games that allow complex and constrained locomotion movements, where the number of transitions grows exponentially with the number of states.
In this paper, we present a novel approach, based on deep recurrent neural networks, to automatically generate such transitions given a past context of a few frames and a target character state to reach. We present the Recurrent Transition Network (RTN), based on a modified version of the Long-Short-Term-Memory (LSTM) network, designed specifically for transition generation and trained without any gait, phase, contact or action labels. We further propose a simple yet principled way to initialize the hidden states of the LSTM layer for a given sequence which improves the performance and generalization to new motions. We both quantitatively and qualitatively evaluate our system and show that making the network terrain-aware by adding a local terrain representation to the input yields better performance for rough-terrain navigation on long transitions. Our system produces realistic and fluid transitions that rival the quality of Motion Capture-based ground-truth motions, even before applying any inverse-kinematics postprocess.
Direct benefits of our approach could be to accelerate the creation of transition variations for large coverage, or even to entirely replace transition nodes in an animation graph. We further explore applications of this model in a animation super-resolution setting where we temporally decompress animations saved at 1 frame per second and show that the network is able to reconstruct motions that are hard to distinguish from un-compressed locomotion sequences.

Recent work on sequence to sequence translation using Recurrent Neural Networks (RNNs) based on L... more Recent work on sequence to sequence translation using Recurrent Neural Networks (RNNs) based on Long Short Term Memory (LSTM) architectures has shown great potential for learning useful representations of sequential data. A one-to-many encoder-decoder(s) scheme allows for a single encoder to provide representations serving multiple purposes. In our case, we present an LSTM encoder network able to produce representations used by two decoders: one that reconstructs, and one that classifies if the training sequence has an associated label. This allows the network to learn representations that are useful for both discriminative and reconstructive tasks at the same time. This paradigm is well suited for semi-supervised learning with sequences and we test our proposed approach on an action recognition task using motion capture (MOCAP) sequences. We find that semi-supervised feature learning can improve state-of-the-art movement classification accuracy on the HDM05 action dataset. Further, we find that even when using only labeled data and a primarily discriminative objective the addition of a reconstructive decoder can serve as a form of regularization that reduces over-fitting and improves test set accuracy.

Papers by Félix G . Harvey

Special Interest Group on Computer Graphics and Interactive Techniques Conference Real-Time Live!

arXiv (Cornell University), Nov 20, 2015

We explore recurrent encoder multi-decoder neural network architectures for semi-supervised seque... more We explore recurrent encoder multi-decoder neural network architectures for semi-supervised sequence classification and reconstruction. We find that the use of multiple reconstruction modules helps models generalize in a classification task when only a small amount of labeled data is available, which is often the case in practice. Such models provide useful high-level representations of motions allowing clustering, searching and faster labeling of new sequences. We also propose a new, realistic partitioning of a well-known, high quality motion-capture dataset for better evaluations. We further explore a novel formulation for future-predicting decoders based on conditional recurrent generative adversarial networks, for which we propose both soft and hard constraints for transition generation derived from desired physical properties of synthesized future movements and desired animation goals. We find that using such constraints allow to stabilize the training of recurrent adversarial architectures for animation generation.

arXiv (Cornell University), Jun 3, 2021

Our work focuses on the development of a learnable neural representation of human pose for advanc... more Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose from the learned latent space. We show that our architecture outperforms a baseline based on Transformer, both in terms of accuracy and computational efficiency. Additionally, we develop a user interface to integrate our neural model in Unity, a real-time 3D development platform. Furthermore, we introduce two new datasets representing the static human pose modeling problem, based on high-quality human motion capture data. Our code is publically available here: https://github.com/boreshkinai/protores.

arXiv (Cornell University), Aug 6, 2019

In multi-agent reinforcement learning, discovering successful collective behaviors is challenging... more In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents. While the tractability of independent agent-wise exploration is appealing, this approach fails on tasks that require elaborate group strategies. We argue that coordinating the agents' policies can guide their exploration and we investigate techniques to promote such an inductive bias. We propose two policy regularization methods: TeamReg, which is based on interagent action predictability and CoachReg that relies on synchronized behavior selection. We evaluate each approach on four challenging continuous control tasks with sparse rewards that require varying levels of coordination as well as on the discrete action Google Research Football environment. Our experiments show improved performance across many cooperative multi-agent problems. Finally, we analyze the effects of our proposed methods on the policies that our agents learn and show that our methods successfully enforce the qualities that we propose as proxies for coordinated behaviors.

arXiv (Cornell University), Jun 3, 2021

Realism in video games and in movies with Computer Generated Imagery (CGI) is often key for a pos... more Realism in video games and in movies with Computer Generated Imagery (CGI) is often key for a positive, immersive experience. When animating computer-generated characters on screen, motion capture (MOCAP) is often considered the highest standard to obtain such realism. Indeed, MOCAP technologies excel at accurately reproducing human movements in a 3D environment as they are based on precise high-speed and high precision 3D marker tracking in space and time. Although some recent techniques rely less on marker-based capture (mostly in cinema) the data used in this thesis mostly comes from marker-based motion capture using a Vicon system 1. Given such a widespread use motion capture over the years, companies have gathered large amounts of such high quality data, opening the door to data-driven approaches for analysis, classification, generative modeling and others. Meanwhile, over the last decade, deep learning approaches have shown that they are able to surpass many other data-driven approaches for a myriad of tasks when the available data is sufficient. Naturally, interest for applying such approaches on 3D motion data has thus emerged in the animation domain.

arXiv (Cornell University), Aug 16, 2022

Figure 1: Our proposed morphology-aware inverse kinematics approach unlocks novel artistic workfl... more Figure 1: Our proposed morphology-aware inverse kinematics approach unlocks novel artistic workflows such as the one depicted above. Animator takes a photo of a multi-person scene or grabs one of the many pictures available on the web and uses it to initialize a 3D scene editable through a number of advanced animation tools. Our approach enables applying a multiperson 3D scene acquired from a RGB picture to custom user-defined characters and editing their respective 3D poses with the state-of-the-art machine learning inverse kinematics tool integrated in a real-time 3D development software.

$Research paper thumbnail of Motion In-Betweening via Deep <math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Δ</mi></mrow><annotation encoding="application/x-tex">\Delta</annotation></semantics></math>Δ-Interpolator$

IEEE Transactions on Visualization and Computer Graphics

We show that the task of synthesizing missing middle frames, commonly known as motion inbetweenin... more We show that the task of synthesizing missing middle frames, commonly known as motion inbetweening in the animation industry, can be solved more accurately and effectively if a deep learning interpolator operates in the delta mode, using the spherical linear interpolator as a baseline. We demonstrate our empirical findings on the publicly available LaFAN1 dataset. We further generalize this result by showing that the ∆-regime is viable with respect to the reference of the last known frame (also known as the zero-velocity model). This supports the more general conclusion that deep in-betweening in the reference frame local to input frames is more accurate and robust than in-betweening in the global (world) reference frame advocated in previous work. Our code is publicly available at https://github.com/boreshkinai/ delta-interpolator.

ArXiv, 2015

ACM Transactions on Graphics, 2020

SIGGRAPH Asia 2018 Technical Briefs, 2018

Image and Vision Computing

Robust Motion In-betweening, 2020

In this work we present a novel, robust transition generation technique that can serve as a new t... more In this work we present a novel, robust transition generation technique that can serve as a new tool for 3D animators, based on adversarial recurrent neural networks. The system synthesizes high-quality motions that use temporally-sparse keyframes as animation constraints. This is reminiscent of the job of in-betweening in traditional animation pipelines, in which an animator draws motion frames between provided keyframes. We first show that a state-of-the-art motion prediction model cannot be easily converted into a robust transition generator when only adding conditioning information about future keyframes. To solve this problem, we then propose two novel additive embedding modifiers that are applied at each timestep to latent representations encoded inside the network's architecture. One modifier is a time-to-arrival embedding that allows variations of the transition length with a single model. The other is a scheduled target noise vector that allows the system to be robust to target distortions and to sample different transitions given fixed keyframes. To qualitatively evaluate our method, we present a custom MotionBuilder plugin that uses our trained model to perform in-betweening in production scenarios. To quantitatively evaluate performance on transitions and generalizations to longer time horizons , we present well-defined in-betweening benchmarks on a subset of the widely used Human3.6M dataset and on LaFAN1, a novel high quality motion capture dataset that is more appropriate for transition generation. We are releasing this new dataset along with this work, with accompanying code for reproducing our baseline results.

Special Interest Group on Computer Graphics and Interactive Techniques Conference Real-Time Live!

arXiv (Cornell University), Nov 20, 2015

arXiv (Cornell University), Jun 3, 2021

arXiv (Cornell University), Aug 6, 2019

arXiv (Cornell University), Jun 3, 2021

arXiv (Cornell University), Aug 16, 2022

IEEE Transactions on Visualization and Computer Graphics

ArXiv, 2015

ACM Transactions on Graphics, 2020

SIGGRAPH Asia 2018 Technical Briefs, 2018

Image and Vision Computing

Robust Motion In-betweening, 2020