Monocular Depth Estimation Using State-of-the-art Algorithms: A Review (original) (raw)

Towards Real-Time Monocular Depth Estimation for Robotics: A Survey

arXiv: Robotics, 2021

As an essential component for many autonomous driving and robotic activities such as ego-motion estimation, obstacle avoidance and scene understanding, monocular depth estimation (MDE) has attracted great attention from the computer vision and robotics communities. Over the past decades, a large number of methods have been developed. To the best of our knowledge, however, there is not a comprehensive survey of MDE. This paper aims to bridge this gap by reviewing 197 relevant articles published between 1970 and 2021. In particular, we provide a comprehensive survey of MDE covering various methods, introduce the popular performance evaluation metrics and summarize publically available datasets. We also summarize available open-source implementations of some representative methods and compare their performances. Furthermore, we review the application of MDE in some important robotic tasks. Finally, we conclude this paper by presenting some promising directions for future research. This survey is expected to assist readers to navigate this research field.

Progress and Proposals: A Case Study of Monocular Depth Estimation

2021

Deep learning has achieved great results and made rapid progress over the past few years, particularly in the field of computer vision. Deep learning models are composed of artificial neural networks and a supervised, semi-supervised, or unsupervised learning scheme. Larger models have neural network architectures with more parameters, often resulting from more/wider layers. In this paper, we perform a case study in the domain of monocular depth estimation and contribute both a new model as well as a new dataset. We propose PixelBins, a simplification to AdaBins, the existing state-of-the-art model, and obtain comparable performance to state-of-the-art methods. Our method achieves a∼20× reduction in model size as well as an absolute relative error of 0.057 on the popular KITTI benchmark. Furthermore, we conceptualize and justify the need for truly open datasets. Consequently, we introduce a modern, extensible dataset consisting of high quality, cross-calibrated image+point cloud pai...

The Monocular Depth Estimation Challenge

Cornell University - arXiv, 2022

This paper summarizes the results of the first Monocular Depth Estimation Challenge (MDEC) organized at WACV2023. This challenge evaluated the progress of selfsupervised monocular depth estimation on the challenging SYNS-Patches dataset. The challenge was organized on Co-daLab and received submissions from 4 valid teams. Participants were provided a devkit containing updated reference implementations for 16 State-of-the-Art algorithms and 4 novel techniques. The threshold for acceptance for novel techniques was to outperform every one of the 16 SotA baselines. All participants outperformed the baseline in traditional metrics such as MAE or AbsRel. However, pointcloud reconstruction metrics were challenging to improve upon. We found predictions were characterized by interpolation artefacts at object boundaries and errors in relative object positioning. We hope this challenge is a valuable contribution to the community and encourage authors to participate in future editions.

CNN Based Monocular Depth Estimation

E3S Web of Conferences, 2021

In several applications, such as scene interpretation and reconstruction, precise depth measurement from images is a significant challenge. Current depth estimate techniques frequently provide fuzzy, low-resolution estimates. With the use of transfer learning, this research executes a convolutional neural network for generating a high-resolution depth map from a single RGB image. With a typical encoder-decoder architecture, when initializing the encoder, we use features extracted from high-performing pre-trained networks, as well as augmentation and training procedures that lead to more accurate outcomes. We demonstrate how, even with a very basic decoder, our approach can provide complete high-resolution depth maps. A wide number of deep learning approaches have recently been presented, and they have showed significant promise in dealing with the classical ill-posed issue. The studies are carried out using KITTI and NYU Depth v2, two widely utilized public datasets. We also examine...

A Comparative Study of Models for Monocular Depth Estimation in 2D Images

International Journal of Advanced Trends in Computer Science and Engineering , 2021

Monocular depth estimation has been a challenging topic in the field on computer vision. There have been multiple approaches based on stereo and geometrical concepts to try and estimate depth of objects in a two-dimensional field such as that of a plain photograph. While stereo and lidar based approaches have their own merits, there is one issue that seems recurrent in them, the vanishing point problem. An improvised approach to solve this issue involves using deep neural networks to train a model to estimate depth. Even this solution has multiple approaches to it. The general supervised approach, an unsupervised approach (using autoencoders) and a semi-supervised approach (using the concept of transfer learning). This paper presents a comparative account of the three different learning models and their performance evaluation.

An Efficient Approach to Monocular Depth Estimation for Autonomous Vehicle Perception Systems

Sustainability

Depth estimation is critical for autonomous vehicles (AVs) to perceive their surrounding environment. However, the majority of current approaches rely on costly sensors, making wide-scale deployment or integration with present-day transportation difficult. This issue highlights the camera as the most affordable and readily available sensor for AVs. To overcome this limitation, this paper uses monocular depth estimation as a low-cost, data-driven strategy for approximating depth from an RGB image. To achieve low complexity, we approximate the distance of vehicles within the frontal view in two stages: firstly, the YOLOv7 algorithm is utilized to detect vehicles and their front and rear lights; secondly, a nonlinear model maps this detection to the corresponding radial depth information. It is also demonstrated how the attention mechanism can be used to enhance detection precision. Our simulation results show an excellent blend of accuracy and speed, with the mean squared error conver...

Enhancement of Consistent Depth Estimation for Monocular Videos Approach

Depth estimation has made great progress in the last few years due to its applications in robotics science and computer vision. Various methods have been developed and implemented to estimate the depth, without flickers and missing holes. Despite this progress, it is still one of the main challenges for researchers, especially for the video applications which have more difficulties such as the complexity of the neural network which affects the run time. Moreover to use such input like monocular video for depth estimation is considered an attractive idea, particularly for hand-held devices such as mobile phones, nowadays they are very popular for capturing pictures and videos. Here in this work, we focus on enhancing the existing consistent depth estimation for monocular videos approach to be with less usage of memory and with using less number of parameters without having a significant reduction in the quality of the depth estimation.

DME: Unveiling the Bias for Better Generalized Monocular Depth Estimation

Proceedings of the ... AAAI Conference on Artificial Intelligence, 2024

This paper aims to design monocular depth estimation models with better generalization abilities. To this end, we have conducted quantitative analysis and discovered two important insights. First, the Simulation Correlation phenomenon, commonly seen in long-tailed classification problems, also exists in monocular depth estimation, indicating that the imbalanced depth distribution in training data may be the cause of limited generalization ability. Second, the imbalanced and long-tail distribution of depth values extends beyond the dataset scale, and also manifests within each individual image, further exacerbating the challenge of monocular depth estimation. Motivated by the above findings, we propose the Distance-aware Multi-Expert (DME) depth estimation model. Unlike prior methods that handle different depth range indiscriminately, DME adopts a divide-and-conquer philosophy where each expert is responsible for depth estimation of regions within a specific depth range. As such, the depth distribution seen by each expert is more uniform and can be more easily predicted. A pixel-level routing module is further designed and learned to stitch the prediction of all experts into the final depth map. Experiments show that DME achieves state-ofthe-art performance on both NYU-Depth v2 and KITTI, and also delivers favorable zero-shot generalization capability on unseen datasets.

Realtime Object-aware Monocular Depth Estimation in Onboard Systems

International Journal of Control, Automation and Systems, 2021

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects' bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

Learning depth from single monocular images

2005

Abstract We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image.