Contextual object category recognition for RGB-D scene labeling (original) (raw)
Related papers
OLT: A Toolkit for Object Labeling applied to robotic RGB-D datasets
2015 European Conference on Mobile Robots (ECMR), 2015
In this work we present the Object Labeling Toolkit (OLT), a set of software components publicly available for helping in the management and labeling of sequential RGB-D observations collected by a mobile robot. Such a robot can be equipped with an arbitrary number of RGB-D devices, possibly integrating other sensors (e.g. odometry, 2D laser scanners, etc.). OLT first merges the robot observations to generate a 3D reconstruction of the scene from which object segmentation and labeling is conveniently accomplished. The annotated labels are automatically propagated by the toolkit to each RGB-D observation in the collected sequence, providing a dense labeling of both intensity and depth images. The resulting objects' labels can be exploited for many robotic oriented applications, including high-level decision making, semantic mapping, or contextual object recognition. Software components within OLT are highly customizable and expandable, facilitating the integration of already-developed algorithms. To illustrate the toolkit suitability, we describe its application to robotic RGB-D sequences taken in a home environment.
Labeling 3D scenes for Personal Assistant Robots
2011
Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. We use this data to build 3D point clouds of a full scene. In this paper, we address the task of labeling objects in this 3D point cloud of a complete indoor scene such as an office. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurrence relationships and geometric relationships.
This paper presents a novel 3-D object recognition framework for a service robot to eliminate false detections in cluttered office environments where objects are in a great diversity of shapes and difficult to be represented by exact models. Laser point clouds are first converted to bearing angle images and a Gentleboost-based approach is then deployed for multiclass object detection. In order to solve the problem of variable object scales in object detection, a scale coordination technique is adopted in every subscene that is segmented from the whole scene according to the spatial distribution of 3-D laser points. Moreover, semantic information (e.g., ceilings, floors, and walls) extracted from raw 3-D laser points is utilized to eliminate false object detection results. K-means clustering and Mahalanobis distance are finally deployed to perform object segmentation in a 3-D laser point cloud accurately. Experiments were conducted on a real mobile robot to show the validity and performance of the proposed method.
Semantic RGB-D Perception for Cognitive Service Robots
RGB-D Image Analysis and Processing, 2019
Cognitive robots need to understand their surroundings not only in terms of geometry, but they also need to categorize surfaces, detect objects, estimate their pose, etc. Due to their nature, RGB-D sensors are ideally suited to many of these problems, which is why we developed efficient RGB-D methods to address these tasks. In this chapter, we outline the continuous development and usage of RGB-D methods, spanning three applications: Our cognitive service robot Cosero, which participated with great success in the international RoboCup@Home competitions, an industrial kitting application, and cluttered bin picking for warehouse automation. We learn semantic segmentation using convolutional neural networks and random forests and aggregate the surface category in 3D by RGB-D SLAM. We use deep learning methods to categorize surfaces, to recognize objects and to estimate their pose. Efficient RGB-D registration methods are the basis for the manipulation of known objects. They have been extended to non-rigid registration, which allows for transferring manipulation skills to novel objects.
Robotics and Autonomous Systems, 2008
This article introduces the definition of Network Robot Systems (NRS) as is understood in Europe, USA and Japan. Moreover, it describes some of the NRS projects in Europe and Japan and presents a summary of the papers of this Special Issue.
Semantic 3D scene segmentation for robotic assembly process execution
arXiv (Cornell University), 2023
Adapting robot programmes to changes in the environment is a well-known industry problem, and it is the reason why many tedious tasks are not automated in small and medium-sized enterprises (SMEs). A semantic world model of a robot's previously unknown environment created from point clouds is one way for these companies to automate assembly tasks that are typically performed by humans. The semantic segmentation of point clouds for robot manipulators or cobots in industrial environments has received little attention due to a lack of suitable datasets. This paper describes a pipeline for creating synthetic point clouds for specific use cases in order to train a model for point cloud semantic segmentation. We show that models trained with our data achieve high per-class accuracy (>90%) for semantic point cloud segmentation on unseen real-world data. Our approach is applicable not only to the 3D camera used in training data generation but also to other depth cameras based on different technologies. The application tested in this work is a industry-related peg-in-thehole process. With our approach the necessity of user assistance during a robot's commissioning can be reduced to a minimum.
Human-robot collaboration for semantic labeling of the environment
Today's robots are able to perform more and more complex tasks, which usually require a high degree of interaction with the environment they have to operate in. As a consequence, robotic systems should have a deep and specific knowledge of their workspaces, which goes far beyond a simple metric representation a robotic system can build up through SLAM (Simultaneous Localization and Mapping). In this paper, we present a novel human-robot collaboration approach, designed to extract 3D shapes associated to objects of interest pointed out by a human operator. The information regarding the segmented objects are then integrated into a metric map, built by the robot, providing a high-level representation of the environment that embodies all the knowledge required by a robot to actually execute complex tasks.
Semantic Mapping Using Object-Class Segmentation of RGB-D Images
For task planning and execution in unstructured environments, a robot needs the ability to recognize and localize relevant objects. When this information is made persistent in a semantic map, it can be used, e. g., to communicate with humans. In this paper, we propose a novel approach to learning such maps. Our approach registers measurements of RGB-D cameras by means of simultaneous localization and mapping. We employ random decision forests to segment object classes in images and exploit dense depth measurements to obtain scaleinvariance. Our object recognition method integrates shape and texture seamlessly. The probabilistic segmentation from multiple views is filtered in a voxel-based 3D map using a Bayesian framework. We report on the quality of our objectclass segmentation method and demonstrate the benefits in accuracy when fusing multiple views in a semantic map.
Object Classification for Robotic Platforms
Advances in Intelligent Systems and Computing, 2019
Computer vision has been revolutionised in recent years by increased research in convolutional neural networks (CNNs); however, many challenges remain to be addressed in order to ensure fast and accurate image processing when applying these techniques to robotics. These challenges consist of handling extreme changes in scale, illumination, noise, and viewing angles of a moving object. The project main contribution is to provide insight on how to properly train a convolutional neural network (CNN), a specific type of DNN, for object tracking in the context of industrial robotics. The proposed solution aims to use a combination of documented approaches to replicate a pick-and-place task with an industrial robot using computer vision feeding a YOLOv3 CNN. Experimental tests, designed to investigate the requirements of training the CNN in this context, were performed using a variety of objects that differed in shape and size in a controlled environment. The general focus was to detect the objects based on their shape; as a result, a suitable and secure grasp could be selected by the robot. The findings in this article reflect the challenges of training the CNN through brute force. It also highlights the different methods of annotating images and the ensuing results obtained after training the neural network.