Using context from inside‐out vision for improved activity recognition (original) (raw)
Related papers
A dataset for complex activity recognition withmicro and macro activities in a cooking scenario
ArXiv, 2020
Complex activity recognition can benefit from understanding the steps that compose them. Current datasets, however, are annotated with one label only, hindering research in this direction. In this paper, we describe a new dataset for sensor-based activity recognition featuring macro and micro activities in a cooking scenario. Three sensing systems measured simultaneously, namely a motion capture system, tracking 25 points on the body; two smartphone accelerometers, one on the hip and the other one on the forearm; and two smartwatches one on each wrist. The dataset is labeled for both the recipes (macro activities) and the steps (micro activities). We summarize the results of a baseline classification using traditional activity recognition pipelines. The dataset is designed to be easily used to test and develop activity recognition approaches.
2008
Over the past decade, researchers in computer graphics, computer vision, and robotics have begun to work with significantly larger collections of data. A number of sizable databases have been collected and made available to researchers: faces, motion capture, natural scenes, and changes in weather and lighting. These and other databases have done a great deal to facilitate research and to provide standardized test datasets for new algorithms, however, these databases are limited by the constrained settings within which they are collected. We propose a focused effort to capture detailed (high spatial and temporal resolution) human data in the kitchen while cooking several recipes. The database contains multimodal measures of the human activity of subjects performing the tasks involved in cooking and food preparation. Currently we record video from five external cameras and one wearable camera, audio from five balanced microphones and a wearable watch, motion capture with a 12 camera ...
Recognising the actions during cooking task (Cooking task dataset)
2011
The dataset contains the data of acceleration sensors attached to a person during the execution of a kitchen task. It consists of 7 datasets that describe the execution of preparing and having a meal: preparing the ingredients, cooking, serving the meal, having a meal, cleaning the table, and washing the dishes. The aim of the experiment is to investigate the ability of activity recognition approaches to recognise fine-grained user activities based on acceleration data. The results from the dataset can be found in the PlosOne paper "Computational State Space Models for Activity and Intention Recognition. A Feasibility Study" by Krüger et al.
A Database for Fine Grained Activity Detection of Cooking Activities
2012
While activity recognition is a current focus of research the challenging problem of fine-grained activity recognition is largely overlooked. We thus propose a novel database of 65 cooking activities, continuously recorded in a realistic setting. Activities are distinguished by fine-grained body motions that have low inter-class variability and high intra-class variability due to diverse subjects and ingredients. We benchmark two approaches on our dataset, one based on articulated pose tracks and the second using holistic video features ...
First Person Vision for Activity Prediction Using Probabilistic Modeling
October 2018, 2018
Identifying activities of daily living is an important area of research with applications in smart-homes and healthcare for elderly people. It is challenging due to reasons like human self-occlusion, complex natural environment and the human behavior when performing a complicated task. From psychological studies, we know that human gaze is closely linked with the thought process and we tend to “look” at the objects before acting on them. Hence, we have used the object information present in gaze images as the context and formed the basis for activity prediction. Our system is based on HMM (Hidden Markov Models) and trained using ANN (Artificial Neural Network). We begin with extracting motion information from TPV (Third Person Vision) streams and object information from FPV (First Person Vision) cameras. The advantage of having FPV is that the object information forms the context of the scene. When context is included as input to the HMM for activity recognition, the precision incre...