Review on Indoor RGB-D Semantic Segmentation with Deep Convolutional Neural Networks (original) (raw)
Related papers
2013
This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such as an FPGA.
Indoor Semantic Segmentation using depth information
arXiv preprint arXiv:1301.3572, 2013
Abstract: This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such ...
Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis
2021 IEEE International Conference on Robotics and Automation (ICRA), 2021
Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semantic segmentation can enhance various subsequent tasks, such as (semantically assisted) person perception, (semantic) free space detection, (semantic) mapping, and (semantic) navigation. In this paper, we propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIA TensorRT and, thus, is well suited as a common initial processing step in a complex system for scene analysis on mobile robots. We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed. We evaluate our proposed Efficient Scene Analysis Network (ESANet) on the common indoor datasets NYUv2 and SUNRGB-D and show that it reaches state-of-the-art performance when considering both segmentation performance and runtime. Furthermore, our evaluation on the outdoor dataset Cityscapes shows that our approach is suitable for other areas of application as well. Finally, instead of presenting benchmark results only, we show qualitative results in one of our indoor application scenarios.
Toward real-time indoor semantic segmentation using depth information
2014
This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on handcrafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. Using a frame by frame labeling, we obtain nearly state-of-the-art performance on the NYU-v2 depth dataset with an accuracy of 64.5%. We then show that the labeling can be further improved by exploiting the temporal consistency in the video sequence of the scene. To that goal, we present a method producing temporally consistent superpixels from a streaming video. Among the different methods producing superpixel segmentations of an image, the graph-based approach of Felzenszwalb and Huttenlocher is broadly employed. One of its interesting properties is that the regions are computed in a greedy manner in quasi-linear time by using a minimum spanning tree. In a framework e...
Incorporating depth into both CNN and CRF for indoor semantic segmentation
2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), 2017
In this paper, we address the problem of indoor semantic segmentation by incorporating the depth information into the convolutional neural network and conditional random field of a neural network architecture. The architecture combines a RGB-D fully convolutional neural network (DFCN) with a depth-sensitive fully-connected conditional random field (DCRF). In the DFCN module, the depth information is incorporated into the early layers using a fusion structure which is followed by several dilated convolution layers for contextual reasoning. Later in the DCRF module, a depth-sensitive fully-connected conditional random field (DCRF) is proposed and combined with the previous DFCN output to refine the preliminary result. Comparative experiments show that the proposed DFCN-DCRF architecture achieves competitive performance compared with state-of-the-art methods.
Multi-Scale Convolutional Architecture for Semantic Segmentation
2015
Advances in 3D sensing technologies have made the availability of RGB and Depth information easier than earlier which can greatly assist in the semantic segmentation of 2D scenes. There are many works in literature that perform semantic segmentation in such scenes, but few relates to the environment that possesses a high degree of clutter in general e.g. indoor scenes. In this paper, we explore the use of depth information along with RGB and deep convolutional network for indoor scene understanding through semantic labeling. Our work exploits the geocentric encoding of a depth image and uses a multi-scale deep convolutional neural network architecture that captures high and lowlevel features of a scene to generate rich semantic labels. We apply our method on indoor RGBD images from NYUD2 dataset [1] and achieve a competitive performance of 70.45 % accuracy in labeling four object classes compared with some prior approaches. The results show our system is capable of generating a pixe...
Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks
Lecture Notes in Computer Science, 2014
In semantic scene segmentation, every pixel of an image is assigned a category label. This task can be made easier by incorporating depth information, which structured light sensors provide. Depth, however, has very dierent properties from RGB image channels. In this paper, we present a novel method to provide depth information to convolutional neural networks. For this purpose, we apply a simplied version of the histogram of oriented depth (HOD) descriptor to the depth channel. We evaluate the network on the challenging NYU Depth V2 dataset and show that with our method, we can reach competitive performance at a high frame rate.
Fully Convolutional Networks for Semantic Segmentation from RGB-D images
2016
In recent years new trends such as industry 4.0 boosted the research and development in the field of autonomous systems and robotics. Robots collaborate and even take over complete tasks of humans. But the high degree of automation requires high reliability even in complex and changing environments. Those challenging conditions make it hard to rely on static models of the real world. In addition to adaptable maps, mobile robots require a local and current understanding of the scene. The Bosch Start-Up Company is developing robots for intra-logistic systems, which could highly benefit from such a detailed scene understanding. The aim of this work is to research and develop such a system for warehouse environments. While the possible field of application is in general very broad, this work will focus on the detection and localization of warehouse specific objects such as palettes. In order to provide a meaningful perception of the surrounding a RGB-D camera is used. A pre-trained conv...
Semantic Segmentation Leveraging Simultaneous Depth Estimation
Sensors
Semantic segmentation is one of the most widely studied problems in computer vision communities, which makes a great contribution to a variety of applications. A lot of learning-based approaches, such as Convolutional Neural Network (CNN), have made a vast contribution to this problem. While rich context information of the input images can be learned from multi-scale receptive fields by convolutions with deep layers, traditional CNNs have great difficulty in learning the geometrical relationship and distribution of objects in the RGB image due to the lack of depth information, which may lead to an inferior segmentation quality. To solve this problem, we propose a method that improves segmentation quality with depth estimation on RGB images. Specifically, we estimate depth information on RGB images via a depth estimation network, and then feed the depth map into the CNN which is able to guide the semantic segmentation. Furthermore, in order to parse the depth map and RGB images simul...
Depth and Height Aware Semantic RGB-D Perception with Convolutional Neural Networks
Convolutional neural networks are popular for image labeling tasks, because of built-in translation invariance. They do not adopt well to scale changes, however, and cannot easily adjust to classes which regularly appear in certain scene regions. This is especially true when the network is applied in a sliding window. When depth data is available, we can address both problems. We propose to adjust the size of processed windows to the depth and to supply inferred height above ground to the network, which significantly improves object-class segmentation results on the NYU depth dataset.