A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets (original) (raw)

A Survey on Deep Learning-Based 2D Human Pose Estimation Models

In this article, a comprehensive survey of deep learning-based (DLbased) human pose estimation (HPE) that can help researchers in the domain of computer vision is presented. HPE is among the fastest-growing research domains of computer vision and is used in solving several problems for human endeavours. After the detailed introduction, three different human body modes followed by the main stages of HPE and two pipelines of twodimensional (2D) HPE are presented. The details of the four components of HPE are also presented. The keypoints output format of two popular 2D HPE datasets and the most cited DL-based HPE articles from the year of breakthrough are both shown in tabular form. This study intends to highlight the limitations of published reviews and surveys respecting presenting a systematic review of the current DL-based solution to the 2D HPE model. Furthermore, a detailed and meaningful survey that will guide new and existing researchers on DL-based 2D HPE models is achieved. Finally, some future research directions in the field of HPE, such as limited data on disabled persons and multi-training DL-based models, are revealed to encourage researchers and promote the growth of HPE research.

Deep learning based 2D human pose estimation: A survey

Tsinghua Science and Technology, 2019

Human pose estimation has received significant attention recently due to its various applications in the real world. As the performance of the state-of-the-art human pose estimation methods can be improved by deep learning, this paper presents a comprehensive survey of deep learning based human pose estimation methods and analyzes the methodologies employed. We summarize and discuss recent works with a methodologybased taxonomy. Single-person and multi-person pipelines are first reviewed separately. Then, the deep learning techniques applied in these pipelines are compared and analyzed. The datasets and metrics used in this task are also discussed and compared. The aim of this survey is to make every step in the estimation pipelines interpretable and to provide readers a readily comprehensible explanation. Moreover, the unsolved problems and challenges for future research are discussed.

Structured Prediction of 3D Human Pose with Deep Neural Networks

Procedings of the British Machine Vision Conference 2016, 2016

Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from image to 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images that relies on an overcomplete auto-encoder to learn a high-dimensional latent pose representation and account for joint dependencies. We demonstrate that our approach outperforms state-of-the-art ones both in terms of structure preservation and prediction accuracy.

Generation an Annotated Dataset of Human Poses for Deep Learning Networks Based on Motion Tracking System

2020

In this paper, we propose an original method for relatively fast generation an annotated data set of human's poses for deep neural networks training based on 3D motion capture system. Compared to default pose detection DNNs trained on commonly used open datasets the method makes possible to recognize specific poses and actions more accurately and decreases need for additional image processing operations aimed at correction of various detection errors inherent to these DNNs. We used preinstalled IR motion capture system with reflective passive tags not to capture movement itself but to extract human keypoints at 3D space and got video record at corresponding timestamps. Obtained 3D trajectories were synchronized in time and space with streams from several cameras using approaches of mutual camera calibration and photogrammetry. It allowed us to accurately project keypoint from 3D space to 2D video frame plane and generate human pose annotations for recorded video and train deep neural network based on this dataset

ProtoRes: Proto-Residual Architecture for Deep Modeling of Human Pose

arXiv (Cornell University), 2021

Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose from the learned latent space. We show that our architecture outperforms a baseline based on Transformer, both in terms of accuracy and computational efficiency. Additionally, we develop a user interface to integrate our neural model in Unity, a real-time 3D development platform. Furthermore, we introduce two new datasets representing the static human pose modeling problem, based on high-quality human motion capture data. Our code is publically available here: https://github.com/boreshkinai/protores.

Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild

ArXiv, 2020

This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image. Despite great progress in this field in terms of pose prediction accuracy, state-of-the-art methods often predict inaccurate body shapes. We suggest that this is primarily due to the scarcity of in-the-wild training data with diverse and accurate body shape labels. Thus, we propose STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system that utilises proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network, which is trained with synthetic training data (generated on-the-fly during training using the SMPL statistical body model) to overcome data scarcity. We bridge the gap between synthetic training inputs and noisy real inputs, which are predicted by keypoint detection and segmentation CNNs at test-time, by using data augmentation and corruption during training. In order to evaluate our approach, we curate and pro...

Body Pose Estimation using Deep Learning

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2023

Healthcare, sports analysis, gaming, and entertainment are just some of the many fields that could benefit from solving the challenging issue of real-time human pose detection and recognition in computer vision. Capturing human motion, analysing physical exercise, and giving feedback on performance can all benefit from reliable detection and recognition of body poses. The recent progress in deep learning has made it possible to create real-time systems that can accurately and quickly recognise and identify human poses.

Unveiling the Landscape of Human Pose Estimation

Research Square (Research Square), 2024

This paper presents a comprehensive survey and methodology for deep learningbased solutions in articulated human pose estimation (HPE). Recent advances in deep learning have revolutionized the HPE field, with the capturing system transitioning from multi-modal to a regular color camera and from multi-views to a monocular view, opening up numerous applications. However, the increasing variety of deep network architectures has resulted in a vast literature on the topic, making it challenging to identify commonalities and differences among 1 diverse HPE approaches. Therefore, this paper serves two objectives: firstly, it provides a thorough survey of over 100 research papers published since 2015, focusing on deep learning-based solutions for monocular HPE; secondly, it develops a comprehensive methodology that systematically combines existing works and summarizes a unified framework for the HPE problem and its modular components. Unlike previous surveys, this study places emphasis on methodology development in order to provide betters insights and learning opportunities for researchers in the field of computer vision. The paper also summarizes and discusses the quantitative performance of the reviewed methods on popular datasets, while highlighting the challenges involved, such as occlusion and viewpoint variation. Finally, future research directions, such as incorporating temporal information and 3D pose estimation, along with potential solutions to address the remaining challenges in HPE, are presented.

3D human pose estimation from depth maps using a deep combination of poses

Journal of Visual Communication and Image Representation

Many real-world applications require the estimation of human body joints for higher-level tasks as, for example, human behaviour understanding. In recent years, depth sensors have become a popular approach to obtain three-dimensional information. The depth maps generated by these sensors provide information that can be employed to disambiguate the poses observed in two-dimensional images. This work addresses the problem of 3D human pose estimation from depth maps employing a Deep Learning approach. We propose a model, named Deep Depth Pose (DDP), which receives a depth map containing a person and a set of predefined 3D prototype poses and returns the 3D position of the body joints of the person. In particular, DDP is defined as a ConvNet that computes the specific weights needed to linearly combine the prototypes for the given input. We have thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which respectively depict realistic and synthetic samples, defining a new state-of-the-art on them.