Semantic Mapping with Low-Density Point-Clouds for Service Robots in Indoor Environments (original) (raw)

RIU-Net: Embarrassingly simple semantic segmentation of 3D LiDAR point cloud

2019

This paper proposes RIU-Net (for Range-Image U-Net), the adaptation of a popular semantic segmentation network for the semantic segmentation of a 3D LiDAR point cloud. The point cloud is turned into a 2D range-image by exploiting the topology of the sensor. This image is then used as input to a U-net. This architecture has already proved its efficiency for the task of semantic segmentation of medical images. We demonstrate how it can also be used for the accurate semantic segmentation of a 3D LiDAR point cloud and how it represents a valid bridge between image processing and 3D point cloud processing. Our model is trained on range-images built from KITTI 3D object detection dataset. Experiments show that RIU-Net, despite being very simple, offers results that are comparable to the state-of-the-art of range-image based methods. Finally, we demonstrate that this architecture is able to operate at 90fps on a single GPU, which enables deployment for real-time segmentation.

Efficient Object-Level Semantic Mapping with RGB-D Cameras

Research Square (Research Square), 2023

To autonomously navigate in real-world environments, mobile robots require a dense map to guarantee safety, such as a 3D occupancy map. However, this map lacks semantic information for scene understanding. On the other hand, semantic objects can be introduced to the map with the help of deep neural networks, but they may suffer from critical run-time issues due to heavy processing components. In this paper, we present an efficient semantic mapping system to incrementally build a voxel-based map with individual objects. Firstly, a frame-wise object segmentation scheme is adopted to segment 3D objects from RGB-D images. Then, a new object association strategy with geometry and semantic descriptor is proposed to track and update object information, Finally, these objects are integrated into a CPU-based voxel mapping approach to incrementally build a global object-level volumetric map. Experiments on publicly available indoor datasets show that the proposed system achieves a good semantic mapping performance. Besides, our method outperforms other object-level mapping algorithms in terms of segmentation results and computational efficiency. Furthermore, the system is evaluated within a logistical robotic platform to demonstrate the use case in real-world applications.

LU-Net: An Efficient Network for 3D LiDAR Point Cloud Semantic Segmentation Based on End-to-End-Learned 3D Features and U-Net

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

We propose LU-Net-for LiDAR U-Net, a new method for the semantic segmentation of a 3D LiDAR point cloud. Instead of applying some global 3D segmentation method such as PointNet, we propose an end-to-end architecture for LiDAR point cloud semantic segmentation that efficiently solves the problem as an image processing problem. We first extract high-level 3D features for each point given its 3D neighbors. Then, these features are projected into a 2D multichannel range-image by considering the topology of the sensor. Thanks to these learned features and this projection, we can finally perform the segmentation using a simple U-Net segmentation network, which performs very well while being very efficient. In this way, we can exploit both the 3D nature of the data and the specificity of the Li-DAR sensor. This approach outperforms the state-of-the-art by a large margin on the KITTI dataset, as our experiments show. Moreover, this approach operates at 24fps on a single GPU. This is above the acquisition rate of common LiDAR sensors which makes it suitable for real-time applications.

Combining Lidar Slam and Deep Learning-Based People Detection for Autonomous Indoor Mapping in a Crowded Environment

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

In this paper, we present a mapping system based on an autonomous mobile robot equipped with a LiDAR device and a camera, that can deal with the presence of people. Thanks to a deep learning approach, the position of humans is identified and a new surveying path is planned that brings the robot to scan occluded areas, so as to obtain a complete point cloud of the environment. Experimental results are performed with a wheeled mobile robot in different crowded scenarios, showing the applicability of the proposed approach to perform an autonomous survey avoiding occlusions and automatically removing from the map noisy and spurious objects caused by people presence.

Contextually Guided Semantic Labeling and Search for 3D Point Clouds

arXiv (Cornell University), 2011

RGB-D cameras, which give an RGB image together with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the 3D point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model's parsimony becomes important and we address that by using multiple types of edge potentials. We train the model using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views), we get a performance of 84.06% and 73.38% in labeling office and home scenes respectively for 17 object classes each. We also present a method for a robot to search for an object using the learned model and the contextual information available from the current labelings of the scene. We applied this algorithm successfully on a mobile robot for the task of finding 12 object classes in 10 different offices and achieved a precision of 97.56% with 78.43% recall. 1

Building Semantic Object Maps from Sparse and Noisy 3D Data

IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS 2013), Tokio, Japan, 2013

We present an approach to create a semantic map of an indoor environment, based on a series of 3D point clouds captured by a mobile robot using a Kinect camera. The proposed system reconstructs the surfaces in the point clouds, detects different types of furniture and estimates their poses. The result is a consistent mesh representation of the environment enriched by CAD models corresponding to the detected pieces of furniture. We evaluate our approach on two datasets totaling over 800 frames directly on each individual frame.

Contextually guided semantic labeling and search for three-dimensional point clouds

The International Journal of Robotics Research, 2012

RGB-D cameras, which give an RGB image together with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the three-dimensional (3D) point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurrence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. We train the model using a maximum-margin learning approach. In our experiments concerning a total of 52 3D scenes of homes and offices (composed from about 550 views), we get a performance of 84.06% and 73.38% in labeling office and home scenes respectively for 17 object classes each. We also present a method for a robot to search for an object using ...

Robot@Home, a robotic dataset for semantic mapping of home environments

The International Journal of Robotics Research, 2017

This paper presents the Robot-at-Home dataset (Robot@Home), a collection of raw and processed sensory data from domestic settings aimed at serving as a benchmark for semantic mapping algorithms through the categorization of objects and/or rooms. The dataset contains 87,000+ time-stamped observations gathered by a mobile robot endowed with a rig of four RGB-D cameras and a 2D laser scanner. Raw observations have been processed to produce different outcomes also distributed with the dataset, including 3D reconstructions and 2D geometric maps of the inspected rooms, both annotated with the ground truth categories of the surveyed rooms and objects. The proposed dataset is particularly suited as a testbed for object and/or room categorization systems, but it can be also exploited for a variety of tasks, including robot localization, 3D map building, SLAM, and object segmentation.

Semantic RGB-D Perception for Cognitive Service Robots

RGB-D Image Analysis and Processing, 2019

Cognitive robots need to understand their surroundings not only in terms of geometry, but they also need to categorize surfaces, detect objects, estimate their pose, etc. Due to their nature, RGB-D sensors are ideally suited to many of these problems, which is why we developed efficient RGB-D methods to address these tasks. In this chapter, we outline the continuous development and usage of RGB-D methods, spanning three applications: Our cognitive service robot Cosero, which participated with great success in the international RoboCup@Home competitions, an industrial kitting application, and cluttered bin picking for warehouse automation. We learn semantic segmentation using convolutional neural networks and random forests and aggregate the surface category in 3D by RGB-D SLAM. We use deep learning methods to categorize surfaces, to recognize objects and to estimate their pose. Efficient RGB-D registration methods are the basis for the manipulation of known objects. They have been extended to non-rigid registration, which allows for transferring manipulation skills to novel objects.

RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019

Perception in autonomous vehicles is often carried out through a suite of different sensing modalities. Given the massive amount of openly available labeled RGB data and the advent of high-quality deep learning algorithms for image-based recognition, high-level semantic perception tasks are predominantly solved using high-resolution cameras. As a result of that, other sensor modalities potentially useful for this task are often ignored. In this paper, we push the state of the art in LiDAR-only semantic segmentation forward in order to provide another independent source of semantic information to the vehicle. Our approach can accurately perform full semantic segmentation of LiDAR point clouds at sensor frame rate. We exploit range images as an intermediate representation in combination with a Convolutional Neural Network (CNN) exploiting the rotating LiDAR sensor model. To obtain accurate results, we propose a novel postprocessing algorithm that deals with problems arising from this intermediate representation such as discretization errors and blurry CNN outputs. We implemented and thoroughly evaluated our approach including several comparisons to the state of the art. Our experiments show that our approach outperforms state-of-the-art approaches, while still running online on a single embedded GPU. The code can be accessed at https://github.com/PRBonn/lidar-bonnetal.