RGB-D Human Detection and Tracking for Industrial Environments (original) (raw)

Fast RGB-D people tracking for service robots

Autonomous Robots, 2014

Service robots have to robustly follow and interact with humans. In this paper, we propose a very fast multi-people tracking algorithm designed to be applied on mobile service robots. Our approach exploits RGB-D data and can run in real-time at very high frame rate on a standard laptop without the need for a GPU implementation. It also features a novel depthbased sub-clustering method which allows to detect people within groups or even standing near walls. Moreover, for limiting drifts and track ID switches, an online learning appearance classifier is proposed featuring a three-term joint likelihood. We compared the performances of our system with a number of state-of-the-art tracking algorithms on two public datasets acquired with three static Kinects and a moving stereo pair, respectively. In order to validate the 3D accuracy of our system, we created a new dataset in which RGB-D data are acquired by a moving robot. We made publicly available this dataset which is not only annotated by hand, but the ground-truth position of people and robot are acquired with a motion capture system in order to evaluate tracking accuracy and precision in 3D coordinates. Results of experiments on these datasets are presented, showing that, even without the need for a GPU, our approach achieves state-of-the-art accuracy and superior speed. (a) (b) Fig. 1 Example of our system output: (a) a 3D bounding box is drawn for every tracked person on the RGB image, (b) the corresponding 3D point cloud is reported, together with the estimated people trajectories.

Multiple Human Tracking in RGB-D Data: A Survey

ArXiv, 2016

Multiple human tracking (MHT) is a fundamental task in many computer vision applications. Appearance-based approaches, primarily formulated on RGB data, are constrained and affected by problems arising from occlusions and/or illumination variations. In recent years, the arrival of cheap RGB-Depth (RGB-D) devices has {led} to many new approaches to MHT, and many of these integrate color and depth cues to improve each and every stage of the process. In this survey, we present the common processing pipeline of these methods and review their methodology based (a) on how they implement this pipeline and (b) on what role depth plays within each stage of it. We identify and introduce existing, publicly available, benchmark datasets and software resources that fuse color and depth data for MHT. Finally, we present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets.

Towards a 3D Pipeline for Monitoring and Tracking People in an Indoor Scenario using Multiple RGBD Sensors

Proceedings of the 10th International Conference on Computer Vision Theory and Applications, 2015

Human monitoring and tracking has been a prominent research area for many scientists around the globe. Several algorithms have been introduced and improved over the years, eliminating false positives and enhancing monitoring quality. While the majority of approaches are restricted to the 2D and 2.5D domain, 3D still remains an unexplored field. Microsoft Kinect is a low cost commodity sensor extensively used by the industry and research community for several indoor applications. Within this framework, an accurate and fastto-implement pipeline is introduced working in two main directions: pure 3D foreground extraction of moving people in the scene and interpretation of the human movement using an ellipsoid as a mathematical reference model. The proposed work is part of an industrial transportation research project whose aim is to monitor the behavior of people and make a distinction between normal and abnormal behaviors in public train wagons. Ground truth was generated by the OpenNI human skeleton tracker and used for evaluating the performance of the proposed method.

Real-Time Multiple Human Perception with Color-Depth Cameras on a Mobile Robot

The ability to perceive humans is an essential requirement for safe and efficient human-robot interaction. In realworld applications, the need for a robot to interact in real time with multiple humans in a dynamic, 3-D environment presents a significant challenge. The recent availability of commercial color-depth cameras allow for the creation of a system that makes use of the depth dimension, thus enabling a robot to observe its environment and perceive in the 3-D space. Here we present a system for 3-D multiple human perception in real time from a moving robot equipped with a color-depth camera and a consumer-grade computer. Our approach reduces computation time to achieve real-time performance through a unique combination of new ideas and established techniques. We remove the ground and ceiling planes from the 3-D point cloud input to separate candidate point clusters. We introduce the novel information concept, depth of interest, which we use to identify candidates for detection, and that avoids the computationally expensive scanning-window methods of other approaches. We utilize a cascade of detectors to distinguish humans from objects, in which we make intelligent reuse of intermediary features in successive detectors to improve computation. Because of the high computational cost of some methods, we represent our candidate tracking algorithm with a decision directed acyclic graph, which allows us to use the most computationally intense techniques only where necessary. We detail the successful implementation of our novel approach on a mobile robot and examine its performance in scenarios with real-world challenges, including occlusion, robot motion, nonupright humans, humans leaving and reentering the field of view (i.e., the reidentification challenge), human-object and human-human interaction. We conclude with the observation that the incorporation of the depth information, together with the use of modern techniques in new ways, we are able to create an accurate system for real-time 3-D perception of humans by a mobile robot.

Color-based 3D particle filtering for robust tracking in heterogeneous environments

2008 Second ACM/IEEE International Conference on Distributed Smart Cameras, 2008

Most multi-camera 3D tracking and positioning systems rely on several independent 2D tracking modules applied over individual camera streams, fused using both geometrical relationships across cameras and/or observed appearance of objects. However, 2D tracking systems suffer inherent difficulties due to point of view limitations (perceptually similar foreground and background regions causing fragmentation of moving objects, occlusions, etc.) and, therefore, 3D tracking based on partially erroneous 2D tracks are likely to fail when handling multiple-people interaction. In this paper, we propose a Bayesian framework for combining 2D low-level cues from multiple cameras directly into the 3D world through 3D Particle Filters. This novel method (direct 3D operation) allows the estimation of the probability of a certain volume being occupied by a moving object, using 2D motion detection and color features as state observations of the Particle Filter framework. For this purpose, an efficient color descriptor has been implemented, which automatically adapts itself to image noise, proving able to deal with changes in illumination and shape variations. The ability of the proposed framework to correctly track multiple 3D objects over time is tested on a real indoor scenario, showing satisfactory results.

People detection and tracking using a network of low-cost depth cameras

2014

Automatic people detection is a widely adopted technology that has applications in retail stores, crowd management and surveillance. The goal of this work is to create a general purpose people detection framework. Studies on people detection, tracking and re-identification are reviewed. The emphasis is on people detection from depth images. Furthermore, an approach based on a network of smart depth cameras is presented. The performance is evaluated with four image sequences, totalling over 20 000 depth images. Experimental results show that simple and lightweight algorithms are very useful in practical applications.

Human detection and tracking using a Kinect camera for an autonomous service robot

This paper presents a novel method for people detection and tracking using depth images provided by Kinetic camera. The depth image captured by a Kinect camera is analysed using its histogram, allowing for the depth image to be divided in slices, making the retrieval of regions of interest a simple and computationally light process when compared to point clouds. These regions are then classified as human or not using a template matching technique. An efficient gradient descent algorithm is used to perform the template matching, using the RPROP algorithm, and the tracking is performed based on color image histogram comparison for each region of interest, in consecutive frames. The proposed method is viable for on-line detection and tracking of people and has been tested in a mobile platform in an unconstrained environment.

RGB-D Based Tracking of Complex Objects

2016

Tracking the pose of objects is a relevant topic in computer vision, which potentially allows to recover meaningful information for other applications such as task supervision, robot manipulation or activity recognition. In the last years, RGB-D cameras have been widely adopted for this problem with impressive results. However, there are certain objects whose surface properties or complex shapes prevents the depth sensor from returning good depth measurements, and only color-based methods can be applied. In this work, we show how the depth information of the surroundings of the object can still be useful in the object pose tracking with RGB-D even in this situation. Specifically, we propose using the depth information to handle occlusions in a state of the art region-based object pose tracking algorithm. Experiments with recordings of humans naturally interacting with difficult objects have been performed, showing the advantages of our contribution in several image sequences.

Tracking persons using a network of RGBD cameras

2014

A computer vision system that employs an RGBD camera network to track multiple humans is presented. The acquired views are used to volumetrically and photometrically reconstruct and track the humans robustly and in real time. Given the frequent and accurate monitoring of humans in space and time, their locations and walk-through trajectory can be robustly tracked in real-time.

Real-time multisensor people tracking for human-robot spatial interaction

2015

All currently used mobile robot platforms are able to navigate safely through their environment, avoiding static and dynamic obstacles. However, in human populated environments mere obstacle avoidance is not sufficient to make humans feel comfortable and safe around robots. To this end, a large community is currently producing human-aware navigation approaches to create a more socially acceptable robot behaviour. Amajorbuilding block for all Human-Robot Spatial Interaction is the ability of detecting and tracking humans in the vicinity of the robot. We present a fully integrated people perception framework, designed to run in real-time on a mobile robot. This framework employs detectors based on laser and RGB-D data and a tracking approach able to fuse multiple detectors using different versions of data association and Kalman filtering. The resulting trajectories are transformed into Qualitative Spatial Relations based on a Qualitative Trajectory Calculus, to learn and classify diff...