Contextual object category recognition for RGB-D scene labeling (original) (raw)

Elsevier Journal of Robotics and Autonomous Systems, Volume 62, Issue 2, February 2014, Pages 241-256

Recent advances in computer vision on the one hand, and imaging technologies on the other hand, have opened up a number of interesting possibilities for robust 3D scene labeling. This paper presents contributions in several directions to improve the state-of-the-art in RGB-D scene labeling. First, we present a novel combination of depth and color features to recognize different object categories in isolation. Then, we use a context model that exploits detection results of other objects in the scene to jointly optimize labels of co-occurring objects in the scene. Finally, we investigate the use of social media mining to develop the context model, and provide an investigation of its convergence. We perform thorough experimentation on both the publicly available RGB-D Dataset from the University of Washington as well as on the NYU scene dataset. An analysis of the results shows interesting insights about contextual object category recognition, and its benefits.

Dense Real-Time Mapping of Object-Class Semantics from RGB-D Video

We propose a real-time approach to learning semantic maps from moving RGB-D cameras. Our method models geometry, appearance, and semantic labeling of surfaces. We recover camera pose using simultaneous localization and mapping while concurrently recognizing and segmenting object classes in the images. Our object-class segmentation approach is based on random decision forests and yields a dense probabilistic labeling of each image. We implemented it on GPU to achieve a high frame rate. The probabilistic segmentation is fused in octree-based 3D maps within a Bayesian framework. In this way, image segmentations from various view Points are integrated within a 3D map which improves Segmentation quality.We evaluate our system on a large Benchmark dataset and demonstrate state-of-the-art recognition Performance of our object-class segmentation and semantic mapping approaches.

Labeling 3D scenes for Personal Assistant Robots

2011

Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. We use this data to build 3D point clouds of a full scene. In this paper, we address the task of labeling objects in this 3D point cloud of a complete indoor scene such as an office. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurrence relationships and geometric relationships. With a large number of object classes and relations, the model's parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding an object in a large cluttered room.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.