Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking (original) (raw)

DSCnet: Replicating Lidar Point Clouds With Deep Sensor Cloning

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Convolutional neural networks (CNNs) have become increasingly popular for solving a variety of computer vision tasks, ranging from image classification to image segmentation. Recently, autonomous vehicles have created a demand for depth information, which is often obtained using hardware sensors such as Light detection and ranging (LIDAR). Although it can provide precise distance measurements, most LIDARs are still far too expensive to sell in mass-produced consumer vehicles, which has motivated methods to generate depth information from commodity automotive sensors like cameras. In this paper, we propose an approach called Deep Sensor Cloning (DSC). The idea is to use Convolutional Neural Networks in conjunction with inexpensive sensors to replicate the 3D point-clouds that are created by expensive LIDARs. To accomplish this, we develop a new dataset (DSDepth) and a new family of CNN architectures (DSCnets). While previous tasks such as KITTI depth prediction use an interpolated RGB-D images as ground-truth for training, we instead use DSCnets to directly predict LIDAR point-clouds. When we compare the output of our models to a $75,000 LIDAR, we find that our most accurate DSCnet achieves a relative error of 5.77% using a single camera and 4.69% using stereo cameras.

PU-Net: Point Cloud Upsampling Network

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

Learning and analyzing 3D point clouds with deep networks is challenging due to the sparseness and irregularity of the data. In this paper, we present a data-driven point cloud upsampling technique. The key idea is to learn multilevel features per point and expand the point set via a multibranch convolution unit implicitly in feature space. The expanded feature is then split to a multitude of features, which are then reconstructed to an upsampled point set. Our network is applied at a patch-level, with a joint loss function that encourages the upsampled points to remain on the underlying surface with a uniform distribution. We conduct various experiments using synthesis and scan data to evaluate our method and demonstrate its superiority over some baseline methods and an optimization-based method. Results show that our upsampled points have better uniformity and are located closer to the underlying surfaces.

LU-Net: An Efficient Network for 3D LiDAR Point Cloud Semantic Segmentation Based on End-to-End-Learned 3D Features and U-Net

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

We propose LU-Net-for LiDAR U-Net, a new method for the semantic segmentation of a 3D LiDAR point cloud. Instead of applying some global 3D segmentation method such as PointNet, we propose an end-to-end architecture for LiDAR point cloud semantic segmentation that efficiently solves the problem as an image processing problem. We first extract high-level 3D features for each point given its 3D neighbors. Then, these features are projected into a 2D multichannel range-image by considering the topology of the sensor. Thanks to these learned features and this projection, we can finally perform the segmentation using a simple U-Net segmentation network, which performs very well while being very efficient. In this way, we can exploit both the 3D nature of the data and the specificity of the Li-DAR sensor. This approach outperforms the state-of-the-art by a large margin on the KITTI dataset, as our experiments show. Moreover, this approach operates at 24fps on a single GPU. This is above the acquisition rate of common LiDAR sensors which makes it suitable for real-time applications.

RoofN3D: A Database for 3D Building Reconstruction with Deep Learning

Photogrammetric Engineering & Remote Sensing, 2019

Machine learning methods, in particular those based on deep learning, have gained in importance through the latest development of artificial intelligence and computer hardware. However, the direct application of deep learning methods to improve the results of 3D building reconstruction is often not possible due, for example, to the lack of suitable training data. To address this issue, we present RoofN3D which provides a three-dimensional (3D) point cloud training dataset that can be used to train machine learning models for different tasks in the context of 3D building reconstruction. The details about RoofN3D and the developed framework to automatically derive such training data are described in this paper. Furthermore, we provide an overview of other available 3D point cloud training data and approaches from current literature in which solutions for the application of deep learning to 3D point cloud data are presented. Finally, we exemplarily demonstrate how the provided data can...

Deep FusionNet for Point Cloud Semantic Segmentation

Computer Vision – ECCV 2020, 2020

Many point cloud segmentation methods rely on transferring irregular points into a voxel-based regular representation. Although voxel-based convolutions are useful for feature aggregation, they produce ambiguous or wrong predictions if a voxel contains points from different classes. Other approaches (such as PointNets and point-wise convolutions) can take irregular points for feature learning. But their high memory and computational costs (such as for neighborhood search and ball-querying) limit their ability and accuracy for large-scale point cloud processing. To address these issues, we propose a deep fusion network architecture (FusionNet) with a unique voxel-based "mini-PointNet" point cloud representation and a new feature aggregation module (fusion module) for large-scale 3D semantic segmentation. Our FusionNet can learn more accurate point-wise predictions when compared to voxel-based convolutional networks. It can realize more effective feature aggregations with lower memory and computational complexity for large-scale point cloud segmentation when compared to the popular point-wise convolutions. Our experimental results show that FusionNet can take more than one million points on one GPU for training to achieve state-of-the-art accuracy on large-scale Semantic KITTI benchmark. The code will be available at https://github.com/feihuzhang/LiDARSeg.

Segmentation and Reconstruction of 3D Models from a Point Cloud with Deep Neural Networks

2018 International Conference on Information and Communication Technology Convergence (ICTC), 2018

The need to model visual information with compact representations has existed since the early days of computer vision. We implemented in the past a segmentation and model recovery method for range images which is unfortunately too slow for current size of 3D point clouds and type of applications. Recently, neural networks have become the popular choice for quick and effective processing of visual data. In this article we demonstrate that with a convolutional neural network we could achieve comparable results, that is to determine and model all objects in a given 3D point cloud scene. We started off with a simple architecture that could predict the parameters of a single object in a scene. Then we expanded it with an architecture similar to Faster R-CNN, that could predict the parameters for any number of objects in a scene. The results of the initial neural network were satisfactory. The second network, that performed also segmentation, still gave decent results comparable to the original method, but compared to the initial one, performed somewhat worse. Results, however, are encouraging but further experiments are needed to build CNNs that will be able to replace the state-of-the-art method.

SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images

Pattern Analysis and Applications

3D model generation from single 2D RGB images is a challenging and actively researched computer vision task. Various techniques using conventional network architectures have been proposed for the same. However, the body of research work is limited and there are various issues like using inefficient 3D representation formats, weak 3D model generation backbones, inability to generate dense point clouds, dependence of post-processing for generation of dense point clouds, and dependence on silhouettes in RGB images. In this paper, a novel 2D RGB image to point cloud conversion technique is proposed, which improves the state of art in the field due to its efficient, robust and simple model by using the concept of parallelization in network architecture. It not only uses the efficient and rich 3D representation of point clouds, but also uses a novel and robust point cloud generation backbone in order to address the prevalent issues. This involves using a single-encoder multiple-decoder

DPRNet: Deep 3D Point based Residual Network for Semantic Segmentation and Classification of 3D Point Clouds

Point clouds are an important type of geometric data obtained from a variety of 3D sensors. They do not have an explicit neighborhood structure and therefore several researchers often perform a voxelization step to obtain structured 3D neighborhood. This, however, comes with certain disadvantages. e.g., it makes the data unnecessarily voluminous, enforces additional computation effort and can potentially introduce quantization errors that may not only hinder in extracting implicit 3D shape information but also in capturing the essential data invariances for the required segmentation and recognition task. In this context, this paper addresses the highly challenging problem of semantic segmentation and 3D object recognition using raw unstructured 3D point cloud data. Specifically, a deep network architecture has been proposed which consists of a cascaded combination of 3D point based residual networks for simultaneous semantic scene segmentation and object classification. It exploits the 3D point based convolutions for representational learning from raw unstructured 3D point cloud data. The proposed architecture has a simple design, easier implementation and the performance which is better than the existing state-of-the architectures particularly for semantic scene segmentation over three public datasets. The implementation and evaluation is made public here https://github.com/saira05/DPRNet.

FCDSN-DC: An Accurate and Lightweight Convolutional Neural Network for Stereo Estimation with Depth Completion

Cornell University - arXiv, 2022

We propose an accurate and lightweight convolutional neural network for stereo estimation with depth completion. We name this method fully-convolutional deformable similarity network with depth completion (FCDSN-DC). This method extends FC-DCNN by improving the feature extractor, adding a network structure for training highly accurate similarity functions and a network structure for filling inconsistent disparity estimates. The whole method consists of three parts. The first part consists of fully-convolutional densely connected layers that computes expressive features of rectified image pairs. The second part of our network learns highly accurate similarity functions between this learned features. It consists of densely-connected convolution layers with a deformable convolution block at the end to further improve the accuracy of the results. After this step an initial disparity map is created and the left-right consistency check is performed in order to remove inconsistent points. The last part of the network then uses this input together with the corresponding left RGB image in order to train a network that fills in the missing measurements. Consistent depth estimations are gathered around invalid points and are parsed together with the RGB points into a shallow CNN network structure in order to recover the missing values. We evaluate our method on challenging real world indoor and outdoor scenes, in particular Middlebury, KITTI and ETH3D were it produces competitive results. We furthermore show that this method generalizes well and is well suited for many applications without the need of further training.

CorrNet3D: Unsupervised End-to-end Learning of Dense Correspondence for 3D Point Clouds

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Motivated by the intuition that one can transform two aligned point clouds to each other more easily and meaningfully than a misaligned pair, we propose CorrNet3Dthe first unsupervised and end-to-end deep learning-based framework-to drive the learning of dense correspondence between 3D shapes by means of deformation-like reconstruction to overcome the need for annotated data. Specifically, CorrNet3D consists of a deep feature embedding module and two novel modules called correspondence indicator and symmetric deformer. Feeding a pair of raw point clouds, our model first learns the pointwise features and passes them into the indicator to generate a learnable correspondence matrix used to permute the input pair. The symmetric deformer, with an additional regularized loss, transforms the two permuted point clouds to each other to drive the unsupervised learning of the correspondence. The extensive experiments on both synthetic and real-world datasets of rigid and non-rigid 3D shapes show our CorrNet3D outperforms state-of-the-art methods to a large extent, including those taking meshes as input. CorrNet3D is a flexible framework in that it can be easily adapted to supervised learning if annotated data are available. The source code and pre-trained model will be available at https://github.com/ZENGYIMING-EAMON/CorrNet3D.git.