Human Pose Estimation Using Convolutional Neural Networks (original) (raw)
Related papers
Body joints regression using deep convolutional neural networks
2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2016
Human pose estimation is a well-known computer vision problem that receives intensive research interest. The reason for such interest is the wide range of applications that the successful estimation of human pose offers. Articulated pose estimation includes real time acquisition, analysis, processing and understanding of high dimensional visual information. Ensemble learning methods operating on hand-engineered features have been commonly used for addressing this task. Deep learning exploits representation learning methods to learn multiple levels of representations from raw input data, alleviating the need to hand-crafted features. Deep convolutional neural networks are achieving the state-of-the-art in visual object recognition, localisation, detection. In this paper, the pose estimation task is formulated as an offset joint regression problem. The 3D joints positions are accurately detected from a single raw depth image using a deep convolutional neural networks model. The presented method relies on the utilisation of the state-of-the-art data generation pipeline to generate large, realistic, and highly varied synthetic set of training images. Analysis and experimental results demonstrate the generalisation performance and the real time successful application of the proposed method.
A Survey on Human Pose Estimation
IRJET, 2022
Human pose estimation (HPE) depicts the posture of an individual using semantic key points on the human body. In recent times, deep learning methods for HPE have dominated the traditional computer vision techniques which were extensively used in the past. HPE has a wide range of applications including virtual fitness trainers, surveillance, motion sensing gaming consoles (Xbox Kinect), action recognition, tracking and many more. This survey intends to fill in the gaps left by previous surveys as well as provide an update on recent developments in the field. An introduction to HPE is given first, followed by a brief overview of previous surveys. Later, we’ll look into various classifications of HPE (single pose, multiple poses, 2D, 3D, top-down, bottom-up etc.) and datasets that are commonly used in this field. While both 2D and 3D HPE categories are mentioned in this survey, the main focus lies on pose estimation in 2D space. Moving on, various HPE approaches based on deep learning are presented, focusing largely on those optimised for inference on edge devices. Finally, we conclude with the challenges and obstacles faced in this field as well as some potential research opportunities.
HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade method (about 20% on Human3.6M). The proposed method naturally supports training with "inthe-wild" images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images.
Human Pose Detection using Deep Learning
Human pose detection is a crucial problem within the field of Computer Vision. Imagine having the ability to trace a person's every small movement and do a biomechanical analysis in real-time. This technology has a huge implication both now and in the future. Applications based on Human pose detection may including video surveillance, assisted living, healthy lifestyle and sports analysis. Formally speaking, Pose Estimation or detection is predicting the part or joint positions of an individual from a picture or a video. This technology is made possible by combining two major computer technologies i.e. Artificial Intelligence (A.I.) and Computer Graphics
Deep learning based 2D human pose estimation: A survey
Tsinghua Science and Technology, 2019
Human pose estimation has received significant attention recently due to its various applications in the real world. As the performance of the state-of-the-art human pose estimation methods can be improved by deep learning, this paper presents a comprehensive survey of deep learning based human pose estimation methods and analyzes the methodologies employed. We summarize and discuss recent works with a methodologybased taxonomy. Single-person and multi-person pipelines are first reviewed separately. Then, the deep learning techniques applied in these pipelines are compared and analyzed. The datasets and metrics used in this task are also discussed and compared. The aim of this survey is to make every step in the estimation pipelines interpretable and to provide readers a readily comprehensible explanation. Moreover, the unsolved problems and challenges for future research are discussed.
Human Pose Estimation Using Depth-Wise Separable Convolutional Neural Networks
Zenodo (CERN European Organization for Nuclear Research), 2022
When it comes to dynamic human pose estimation, the process known as "identifying human joints in an image or video and determining their position in space" is used. This is done so that the dynamic position of the human body can be more accurately estimated and evaluated. This goal can be achieved by applying various computer vision strategies used in a number of industries such as gaming, robotics training, and animation. In this article, we propose a method for dynamic human pose estimation using convolutional neural networks (CNN). This method will soon be used as a form of physical therapy rehabilitation that can be performed in a remote setting. By making an assessment of the patient's postures, the physical therapist can determine whether or not the patient is performing the assigned exercises correctly. With this method, the physiotherapist can correctly adapt the therapy sessions to the progress that the patient is making in the recovery process.
PoseAnalyser: A Survey on Human Pose Estimation
SN Computer Science
Human pose estimation is the process of detecting the body keypoints of a person and can be used to classify different poses. Many researchers have proposed various ways to get a perfect 2D as well as a 3D human pose estimator that could be applied for various types of applications. This paper is a review of all the state-of-the-art architectures based on human pose estimation, the papers referred were based on the types of computer vision and machine learning algorithms, such as feed-forward neural networks, convolutional neural networks (CNN), OpenPose, MediaPipe, and many more. These different approaches are compared on various parameters, like the type of dataset used, the evaluation metric, etc. Different human pose datasets, such as COCO and MPII activity datasets with keypoints, as well as specific application-based datasets, are reviewed in this survey paper. Researchers may use these architectures and datasets in a range of domains, which are also discussed. The paper analyzes several approaches and architectures that can be used as a guide for other researchers to assist them in developing better techniques to achieve high accuracy.
Multi-Scale Supervised Network for Human Pose Estimation
2018 25th IEEE International Conference on Image Processing (ICIP), 2018
Human pose estimation is an important topic in computer vision with many applications including gesture and activity recognition. However, pose estimation from image is challenging due to appearance variations, occlusions, clutter background, and complex activities. To alleviate these problems, we develop a robust pose estimation method based on the recent deep conv-deconv modules with two improvements: (1) multi-scale supervision of body keypoints, and (2) a global regression to improve structural consistency of keypoints. We refine keypoint detection heatmaps using layer-wise multi-scale supervision to better capture local contexts. Pose inference via keypoint association is optimized globally using a regression network at the end. Our method can effectively disambiguate keypoint matches in close proximity including the mismatch of left-right body parts, and better infer occluded parts. Experimental results show that our method achieves competitive performance among stateof-the-art methods on the MPII and FLIC datasets.
Localizing salient body motion in multi-person scenes using convolutional neural networks
Neurocomputing, 2019
With modern computer vision techniques being successfully developed for a variety of tasks, extracting meaningful knowledge from complex scenes with multiple people still poses problems. Consequently, experiments with application-specific motion, such as gesture recognition scenarios, are often constrained to single person scenes in the literature. Therefore, in this paper we address the challenging task of detecting salient body motion in scenes with more than one person. We propose a neural architecture that only reacts to a specific kind of motion in the scene: A limited set of body gestures. The model is trained end-to-end, thereby avoiding hand-crafted features and the strong reliance on pre-processing as it is prevalent in similar studies. The presented model implements a saliency mechanism that reacts to body motion cues which have not been included in previous computational saliency systems. Our architecture consists of a 3D Convolutional Neural Network that receives a frame sequence as its input and localizes active gesture movement. To train our network with a large data variety, we introduce an approach to combine Kinect recordings of one person into artificial scenes with multiple people, yielding a large diversity of scene configurations in our dataset. We performed experiments using these sequences and show that the proposed model is able to localize the salient body motion of our gesture set. We found that 3D convolutions and a baseline model with 2D convolutions perform surprisingly similar on our task. Our experiments revealed the influence of gesture characteristics on how well they can be learned by our model. Given a distinct gesture set and computational restrictions, we conclude that using 2D convolutions might often perform equally well.
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a multilayer convolutional network architecture and a modified learning technique that learns low-level features and a higher-level weak spatial model. Unconstrained human pose estimation is one of the hardest problems in computer vision, and our new architecture and learning schema shows significant improvement over the current state-of-the-art. The main contribution of this paper is showing, for the first time, that a specific variation of deep learning is able to outperform all existing traditional architectures on this task. The paper also discusses several lessons learned while researching alternatives, most notably, that it is possible to learn strong low-level feature detectors on features that might even just cover a few pixels in the image. Higher-level spatial models improve somewhat the overall result, but to a much lesser extent than expected. Many researchers previously argued that the kinematic structure and top-down information are crucial for this domain, but with our purely bottom-up, and weak spatial model, we could improve on other more complicated architectures that currently produce the best results. This mirrors what many other researchers, like those in the speech recognition, object recognition, and other domains have experienced . Figure 1: The green cross is our new technique's wrist locator, the red cross is the state-of-the-art CVPR13 MODEC detector [36] on the FLIC database.