RGB-D Based Tracking of Complex Objects (original) (raw)

Real-time pose estimation of rigid objects using RGB-D imagery

2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA), 2013

Using full scale (480×640) RGB-D imagery, we here present an approach for tracking 6d pose of rigid objects at runtime frequency of up to 15fps. This approach is useful for robotic perception systems to efficiently track object's pose during camera movements in tabletop manipulation tasks with high detection rate and real-time performance. Specifically, appearance-based feature correspondences are used for initial object detection. We make use of Oriented Brief (ORB) feature key-points to perform fast segmentation of object candidates in the 3d point cloud. The task of 6d pose estimation is handled in the Cartesian space by finding an interest window around the segmented object and 3d geometry operations. The interest window is later used for feature extraction in the next incoming camera frames to speed up the object detection process. This also allows for an efficient pose tracking for scenes where there are significantly large false matches between feature correspondences due to scene clutter. The approach is tested using an RGB-D dataset comprising of scenes from video sequences of tabletops with multiple objects in household environments. Experiments show that our approach is capable of performing 3d segmentation followed by 6d pose tracking at high frame rates.

Robust object tracking based on RGB-D camera

Proceeding of the 11th World Congress on Intelligent Control and Automation, 2014

A novel object tracking method based on RGB-D camera is proposed to handle fast appearance change, occlusion, background clutter which may arise for vision-based robot navigation. It makes use of appearance and depth information that are complementary to each other in visual perception to get robust tracking. First, RGB image and depth information are captured by the RGB-D camera. Then, an online updating appearance model is created with features extracted from RGB image. A motion model is created on plan-view map that is drawn from depth information and camera parameters. The estimation of object position and scale is performed on the motion model. Finally, appearance features are combined with position and scale information to track the target. The performance of our method is compared with a state-of-art video tracking method. It shows that our tracking method is more stable and accurate, and has overwhelming superiority when there is a great appearance change. A vision-based robot using our tracking method can navigate in cluttered environment successfully.

Object Reconstruction and Recognition leveraging an RGB-D camera

mva-org.jp

Recently, sensing devices capable of delivering realtime color and depth information have become available. We show how they can benefit to 3D object model acquisition, detection and pose estimation in the context of robotic manipulation. On the modeling side, we propose a volume carving algorithm capable of reconstructing rough 3D shape with a low processing cost. On the detection side, we find that little robustness can be directly added to classical feature-based techniques, but we propose an interesting combination with traditionally less robust techniques such as histogram comparison. We finally observe that 3D pose estimates can also be greatly improved using the depth measurements.

Multiple Human Tracking in RGB-D Data: A Survey

ArXiv, 2016

Multiple human tracking (MHT) is a fundamental task in many computer vision applications. Appearance-based approaches, primarily formulated on RGB data, are constrained and affected by problems arising from occlusions and/or illumination variations. In recent years, the arrival of cheap RGB-Depth (RGB-D) devices has {led} to many new approaches to MHT, and many of these integrate color and depth cues to improve each and every stage of the process. In this survey, we present the common processing pipeline of these methods and review their methodology based (a) on how they implement this pipeline and (b) on what role depth plays within each stage of it. We identify and introduce existing, publicly available, benchmark datasets and software resources that fuse color and depth data for MHT. Finally, we present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets.

Object Assembly Guidance in Child-Robot Interaction using RGB-D based 3D Tracking

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018

This work examines how and to what benefit an autonomous humanoid robot can supervise a child in an object assembly task. In order to understand the child's actions, a novel 3D object tracking algorithm for RGB-D data is employed. The tracker consists of two stages: the first performs a tracking-by-detection scheme on the color stream, to locate the objects on the image plane, while the second uses a particle filter that operates on the depth data stream to refine the first stage output and infer the objects' rotations. Given the six degrees-of-freedom of the assembly part poses, the system is able to recognize which connections have been completed at any given time. This information is then used to select an appropriate verbal or gestural response for the robot. Experimental results show that (a) the tracking algorithm is accurate, fast and robust to severe occlusions and fast movements, (b) the proposed method of assembly state estimation is indeed effective, and (c) the r...

3D pose estimation of daily objects using an RGB-D camera

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012

In this paper, we present an object pose estimation algorithm exploiting both depth and color information. While many approaches assume that a target region is cleanly segmented from background, our approach does not rely on that assumption, and thus it can estimate pose of a target object in heavy clutter. Recently, an oriented point pair feature was introduced as a low dimensional description of object surfaces. The feature has been employed in a voting scheme to find a set of possible 3D rigid transformations between object model and test scene features. While several approaches using the pair features require an accurate 3D CAD model as training data, our approach only relies on several scanned views of a target object, and hence it is straightforward to learn new objects. In addition, we argue that exploiting color information significantly enhances the performance of the voting process in terms of both time and accuracy. To exploit the color information, we define a color point pair feature, which is employed in a voting scheme for more effective pose estimation. We show extensive quantitative results of comparative experiments between our approach and a state-of-the-art.

Pose Estimation For A Partially Observable Human Body From RGB-D Cameras

HAL (Le Centre pour la Communication Scientifique Directe), 2015

Human pose estimation in realistic world conditions raises multiple challenges such as foreground extraction, background update and occlusion by scene objects. Most of existing approaches were demonstrated in controlled environments. In this paper, we propose a framework to improve the performance of existing tracking methods to cope with these problems. To this end, a robust and scalable framework is provided composed of three main stages. In the first one, a probabilistic occupancy grid updated with a Hidden Markov Model used to maintain an up-to-date background and to extract moving persons. The second stage uses component labelling to identify and track persons in the scene. The last stage uses an hierarchical particle filter to estimate the body pose for each moving person. Occlusions are handled by querying the occupancy grid to identify hidden body parts so that they can be discarded from the pose estimation process. We provide a parallel implementation that runs on CPU and GPU at 4 frames per second. We also validate the approach on our own dataset that consists of synchronized motion capture with a single RGB-D camera data of a person performing actions in challenging situations with severe occlusions generated by scene objects. We make this dataset available online.

Real-time Human Pose Estimation using RGB-D images and Deep Learning

2020

Human Pose Estimation (HPE) which localizes the human body joints becomes a high potential for high-level applications in the field of computer vision. The main challenges of HPE in real-time are occlusion, illumination change and diversity of pose appearance. The single RGB image is fed into HPE framework in order to reduce the computation cost by using depth-independent device such as a common camera, webcam, or phone cam. However, HPE based on the single RGB is not able to solve the above challenges due to inherent characteristics of color or texture. On the other hand, depth information which is fed into HPE framework and detects the human body parts in 3D coordinates can be usefully used to solve the above challenges. However, the depth information-based HPE requires the depth-dependent device which has space constraint and is cost consuming. Especially, the result of depth information-based HPE is less reliable due to the requirement of pose initialization and less stabilization of frame tracking. Therefore, this paper proposes a new method of HPE which is robust in estimating self-occlusion. There are many human parts which can be occluded by other body parts. However, this paper focuses only on head self-occlusion. The new method is a combination of the RGB image-based HPE framework and the depth information-based HPE framework. We evaluated the performance of the proposed method by COCO Object Keypoint Similarity library. By taking an advantage of RGB image-based HPE method and depth information-based HPE method, our HPE method based on RGB-D achieved the mAP of 0.903 and mAR of 0.938. It proved that our method outperforms the RGB-based HPE and the depth-based HPE.

RGB-D Human Detection and Tracking for Industrial Environments

Reliably detecting and tracking movements of nearby workers on the factory floor is crucial to the safety of advanced manufacturing automation in which humans and robots share the same workspace. In this work, we address the problem of multiple people detection and tracking in industrial environments by proposing algorithms which exploit both color and depth data to robustly track people in real-time. For people detection, a cascade organization of these algorithms is proposed, while tracking is performed based on a particle filter which can interpolate sparse detection results by exploiting color histograms of people. Tracking results of different combinations of the proposed methods are evaluated on a novel dataset collected with a consumer RGB-D sensor in an industrial-like environment. Our techniques obtain good tracking performances even in an industrial setting and reach more than 30 Hz update rate. All these algorithms have been released as open source as part of the ROS-Industrial project.

Enhancing Probabilistic Appearance-Based Object Tracking with Depth Information: Object Tracking under Occlusion

Object tracking has attracted recent attention because of high demands for its everyday-life applications. Handling occlusions especially in cluttered environments introduced new challenges to the tracking problem; identity loss, splitting/merging, shape changes, shadows and other appearance artifacts trouble appearance-based tracking techniques. Depth-maps provide necessary clues to retrieve occluded objects after they reappear, recombine split group of objects, compensate drastic appearance changes, and reduce the effect of appearance artifacts. In this study, we not only proposed a consistent way of integrating color and depth information in a particle filter framework to efficiently perform the tracking task, but also enhanced the previous color-based particle filtering to achieve trajectory independence and consistency with respect to the target scale. We also exploited local characteristics to represent the target objects and proposed a novel confidence measure for them. Applying to simple tracking problems, the performance of our method is discussed thoroughly.