Human Activity Recognition in the Context of Industrial Human-Robot Interaction (original) (raw)

Human activity recognition for domestic robots

Capabilities of domestic service robots could be further improved, if the robot is equipped with an ability to recognize activities performed by humans in its sensory range. For example in a simple scenario a floor cleaning robot can vacuum the kitchen floor after recognizing human activity “cooking in the kitchen”. Most of the complex human activities can be sub divided into simple activities which can later used for recognize complex activities. Activities like “take meditation” can be sub divided into simple activities like “opening pill container” and “drinking water”. However, even recognizing simple activities are highly challenging due to the similarities between some inter activities and dissimilarities of intra activities which are performed by different people, body poses and orientations. Even a sim- ple human activity like “drinking water” can be performed while the subject is in different body poses like sitting, standing or walking. Therefore building machine learning techniques to recognize human activities with such complexities is non trivial. To address this issue, we propose a human activity recognition technique that uses 3D skeleton features produced by a depth camera. The algorithm incor- porates importance weights for skeleton 3D joints according to the activity being performed. This allows the algorithm to ignore the confusing or irrelevant features while relying on informative features. Later these joints were ensembled together to train Dynamic Bayesian Networks (DBN), which is then used to infer human activi- ties based on likelihoods. The proposed activity recognition technique is tested on a publicly available dataset and UTS experiments with overall accuracies of 85% and 90%

A Two-Phase Algorithm for Recognizing Human Activities in the Context of Industry 4.0 and Human-Driven Processes

Advances in Intelligent Systems and Computing, 2019

Future industrial systems, a revolution known as Industry 4.0, are envisioned to integrate people into cyber world as prosumers (service providers and consumers). In this context, human-driven processes appear as an essential reality and instruments to create feedback information loops between the social subsystem (people) and the cyber subsystem (technological components) are required. Although many different instruments have been proposed, nowadays pattern recognition techniques are the most promising ones. However, these solutions present some important pending problems. For example, they are dependent on the selected hardware to acquire information from users; or they present a limit on the precision of the recognition process. To address this situation, in this paper it is proposed a two-phase algorithm to integrate people in Industry 4.0 systems and human-driven processes. The algorithm defines complex actions as compositions of simple movements. Complex actions are recognized using Hidden Markov Models, and simple movements are recognized using Dynamic Time Warping. In that way, only movements are dependent on the employed hardware devices to capture information, and the precision of complex action recognition gets greatly increased. A real experimental validation is also carried out to evaluate and compare the performance of the proposed solution.

Human Activity Recognition

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

Human Activity Recognition (HAR) is one of the active research areas in computer vision as well as human computer interaction. However, it remains a very complex task, due to unresolvable challenges such as sensor motion, sensor placement, cluttered background, and inherent variability in the way activities are conducted by different humans. Human activity recognition is an ability to interpret human body gesture or motion via sensors and determine human activity or action. Most of the human daily tasks can be simplified or automated if they can be recognized via HAR system. Typically, HAR system can be either supervised or unsupervised. A supervised HAR system required some prior training with dedicated datasets while unsupervised HAR system is being configured with a set of rules during development. HAR is considered as an important component in various scientific research contexts i.e. surveillance, healthcare and human computer interaction.

Human Activity-Understanding: A Multilayer Approach Combining Body Movements and Contextual Descriptors Analysis

International Journal of Advanced Robotic Systems, 2015

A deep understanding of human activity is key to successful human-robot interaction (HRI). The translation of sensed human behavioural signals/cues and context descriptors into an encoded human activity remains a challenge because of the complex nature of human actions. In this paper, we propose a multilayer framework for the understanding of human activity to be implemented in a mobile robot. It consists of a perception layer which exploits a D-RGB-based skeleton tracking output used to simulate a physical model of virtual human dynamics in order to compensate for the inaccuracy and inconsistency of the raw data. A multi-support vector machine (MSVM) model trained with features describing the human motor coordination through temporal segments in combination with environment descriptors (object affordance) is used to recognize each sub-activity (classification layer). The interpretation of sequences of classified elementary actions is based on discrete hidden Markov models (DHMMs) (...

Real-time recognition of human gestures for collaborative robots on assembly-line

We present a framework and preliminary experimental results for real-time recognition of human operator actions. The goal is, for a collaborative industrial robot operating on same assembly-line as workers, to allow adaptation of its behavior and speed for smooth human-robot cooperation. To this end, it is necessary for the robot to monitor and understand behavior of humans around it. The real-time motion capture is performed using a "MoCap suit" of 12 inertial sensors estimating joint angles of upper-half of human body (neck, wrists, elbows, shoulders, etc…). In our experiment, we consider one particular assembly operation on car doors, which we have further subdivided into 4 successive steps: removing the adhesive protection from the waterproofing sheet, positioning the waterproofing sheet on the door, pre-sticking the sheet on the door, and finally installing the window "sealing strip". The gesture recognition is achieved continuously in real-time, using a technique combining an automatic time-rescaling similar to Dynamic Time Warp (DTW), and Hidden Markov Model (HMM) for estimating respective probabilities of the 4 learnt actions. Preliminary evaluation, conducted in realworld on an experimental assembly cell of car manufacturer PSA, shows a very promising action correct recognition rate of 96% on several repetitions of the same assembly operation by a single operator. Ongoing work aims at evaluating our framework for same actions recognition but on more executions by a larger pool of different human operators, and also to estimate false recognition rates on unrelated gestures. Another interesting potential perspective is the use of workers' motion capture in order to estimate effort and stress, for helping prevention of physical causes of some musculoskeletal disorders.

A Novel Approach for Machine Learning-Based Identification of Human Activities

IRJET, 2023

Human activity recognition (HAR) is a rapidly growing field of research that uses machine learning to automatically identify and classify human activities from sensor data. This data can be collected from a variety of sources, such as wearable sensors, smartphones, and video cameras. HAR has a wide range of potential applications, including healthcare, sports, and security. In this paper, we present a comprehensive overview of the state-of-the-art in HAR using machine learning based on datasets. We discuss the various feature extraction techniques that can be applied, and the different machine learning algorithms that can be used for model training. We also present a survey of the recent literature on HAR using machine learning, and we discuss the challenges and opportunities that lie ahead in this field. Our findings suggest that HAR using machine learning based on datasets is a promising approach for a variety of applications. However, there are still a number of challenges that need to be addressed in order to improve the accuracy and robustness of HAR systems. These challenges include the need for more accurate and efficient feature extraction techniques, the development of more powerful machine learning algorithms, and the creation of larger and more diverse datasets. We believe that this paper provides a valuable contribution to the field of HAR using machine learning. It provides a comprehensive overview of the stateof-the-art, and it identifies the challenges and opportunities that lie ahead. We hope that this paper will help to accelerate the development of more accurate and reliable HAR systems that can be used to improve the lives of people in a variety of ways.

The HA4M dataset: Multi-Modal Monitoring of an assembly task for Human Action recognition in Manufacturing

Scientific Data, 2022

This paper introduces the Human Action Multi-Modal Monitoring in Manufacturing (HA4M) dataset, a collection of multi-modal data relative to actions performed by different subjects building an Epicyclic Gear Train (EGT). In particular, 41 subjects executed several trials of the assembly task, which consists of 12 actions. Data were collected in a laboratory scenario using a Microsoft ® Azure Kinect which integrates a depth camera, an RGB camera, and InfraRed (IR) emitters. To the best of authors' knowledge, the HA4M dataset is the first multi-modal dataset about an assembly task containing six types of data: RGB images, Depth maps, IR images, RGB-to-Depth-Aligned images, Point Clouds and Skeleton data. These data represent a good foundation to develop and test advanced action recognition systems in several fields, including Computer Vision and Machine Learning, and application domains such as smart manufacturing and human-robot collaboration. Background & Summary Human action recognition is an active topic of research in computer vision 1,2 and machine learning 3,4 and vast research work has been carried out in the last decade, as seen in the existing literature 5. Moreover, the recent widespread of low-cost video camera systems, including depth-cameras 6 , has strengthened the development of observation systems in a variety of application domains such as video-surveillance, safety and smart home security, ambient assisted living, health-care and so on. However, little work has been done in human action recognition for manufacturing assembly 7-9 and the poor availability of public datasets limits the study, development, and comparison of new methods. This is mainly due to challenging issues such as between-action similarity, the complexity of actions, the manipulation of tools and parts, the presence of fine motions and intricate operations. The recognition of human actions in the context of intelligent manufacturing is of great importance for various purposes: to improve operational efficiency 8 ; to promote human-robot cooperation 10 ; to assist operators 11 ; to support employee training 9,12 ; to increase productivity and safety 13 ; or to promote workers' good mental health 14. In this paper, we present the Human Action Multi-Modal Monitoring in Manufacturing (HA4M) dataset which is a multi-modal dataset acquired by an RGB-D camera during the assembly of an Epicyclic Gear Train (EGT) (see Fig. 1). The HA4M dataset provides a good base for developing, validating and testing techniques and methodologies to recognize assembly actions. Literature is rich in RGB-D datasets for human action recognition 15-17 prevalently acquired in indoor/outdoor unconstrained settings. They are mostly related to daily actions (such as walking, jumping, waving, bending, etc.), medical conditions (such as headache, back pain, staggering, etc.), two-person interactions (such as hugging, taking a photo, finger-pointing, giving object, etc.), or gaming actions (such as forward punching, tennis serving, golf swinging, etc.). Table 1 reports some of the most famous and commonly used RGB-D datasets on human action recognition describing their principal peculiarities.

Real-time Activity Recognition by Discerning Qualitative Relationships Between Randomly Chosen Visual Features

Proceedings of the British Machine Vision Conference 2014, 2014

In this paper, we present a novel method to explore semantically meaningful visual information and identify the discriminative spatiotemporal relationships between them for real-time activity recognition. Our approach infers human activities using continuous egocentric (first-person-view) videos of object manipulations in an industrial setup. In order to achieve this goal, we propose a random forest that unifies randomization, discriminative relationships mining and a Markov temporal structure. Discriminative relationships mining helps us to model relations that distinguish different activities, while randomization allows us to handle the large feature space and prevents over-fitting. The Markov temporal structure provides temporally consistent decisions during testing. The proposed random forest uses a discriminative Markov decision tree, where every nonterminal node is a discriminative classifier and the Markov structure is applied at leaf nodes. The proposed approach outperforms the state-of-the-art methods on a new challenging video dataset of assembling a pump system.

Hierarchical Human Action Recognition to Measure the Performance of Manual Labor

IEEE Access

Measuring manual-labor performance has been a key element of work scheduling and resource management in many industries. It is performed using a standard data system called Time and Motion Study (TMS). Many industries still rely on direct human effort to execute the TMS methodology which can be time-consuming, error-prone, and expensive. In this paper, we introduce an automatic replacement of the TMS technique that works at two levels of abstraction: primitive and activity actions. We leverage on recent advancements in deep learning methods and employ an encoder-decoder based classifier to recognize primitives and a continuous-time hidden Markov model to recognize activities. We show that our system yields results competitive with those obtained with several common human action recognition models. We also show how our proposed system can help operational decisions by computing productivity indicators such as worker availability, worker performance, and overall labor effectiveness.