Focusing on Shadows for Predicting Heightmaps from Single Remotely Sensed RGB Images with Deep Learning (original) (raw)

IMG2nDSM: Height Estimation from Single Airborne RGB Images with Deep Learning

2021

Estimating the height of buildings and vegetation in single aerial images is a challenging problem. A task-focused Deep Learning (DL) model that combines architectural features from successful DL models (U-NET and Residual Networks) and learns the mapping from a single aerial imagery to a normalized Digital Surface Model (nDSM) was proposed. The model was trained on aerial images whose corresponding DSM and Digital Terrain Models (DTM) were available and was then used to infer the nDSM of images with no elevation information. The model was evaluated with a dataset covering a large area of Manchester, UK, as well as the 2018 IEEE GRSS Data Fusion Contest LiDAR dataset. The results suggest that the proposed DL architecture is suitable for the task and surpasses other state-of-the-art DL approaches by a large margin.

Generating Elevation Surface from a Single RGB Remotely Sensed Image Using Deep Learning

2020

Generating Digital Elevation Models (DEM) from satellite imagery or other data sources constitutes an essential tool for a plethora of applications and disciplines, ranging from 3D flight planning and simulation, autonomous driving and satellite navigation, such as GPS, to modeling water flow, precision farming and forestry. The task of extracting this 3D geometry from a given surface hitherto requires a combination of appropriately collected corresponding samples and/or specialized equipment, as inferring the elevation from single image data is out of reach for contemporary approaches. On the other hand, Artificial Intelligence (AI) and Machine Learning (ML) algorithms have experienced unprecedented growth in recent years as they can extrapolate rules in a data-driven manner and retrieve convoluted, nonlinear one-to-one mappings, such as an approximate mapping from satellite imagery to DEMs. Therefore, we propose an end-to-end Deep Learning (DL) approach to construct this mapping a...

Large-Scale Vegetation Height Mapping from Sentinel Data Using Deep Learning

2020

The deep learning revolution in computer vision has enabled a potential for creating new value chains for Earth observation that significantly enhances the analysis of satellite data for tasks like land cover mapping, change analysis, and object detection. We demonstrate a deep learning based value chain for the task of mapping vegetation height in the Liwale region in Tanzania using Sentinel-1 and-2 data. As ground truth data we use lidar measurements, which are processed to provide the average vegetation height per Sentinel-2 pixel grid (10 m). We apply the UNet, which is a widely used neural network for segmentation tasks in computer vision, to estimate average vegetation height from the Sentinel data. Preliminary results show that we are able to map the forest extent with high accuracy, with an RMSE of 3.5 m for Sentinel-2 data and 4.6 m for the Sentinel-1 data.

Deep Neural Networks for Determining the Parameters of Buildings from Single-Shot Satellite Imagery

Journal of Computer and Systems Sciences International

The height of a building is a basic characteristic needed for analytical services. It can be used to evaluate the population and functional zoning of a region. The analysis of the height structure of urban territories can be useful for understanding the population dynamics. In this paper, a novel method for determining a building's height from a single-shot Earth remote sensing oblique image is proposed. The height is evaluated by a simulation algorithm that uses the masks of shadows and the visible parts of the walls. The image is segmented using convolutional neural networks that makes it possible to extract the masks of roofs, shadows, and building walls. The segmentation models are integrated into a completely automatic system for mapping buildings and evaluating their heights. The test dataset containing a labeled set of various buildings is described. The proposed method is tested on this dataset, and it demonstrates the mean absolute error of less than 4 meters.

Beyond Measurement: Extracting Vegetation Height from High Resolution Imagery with Deep Learning

Remote Sensing

Measuring and monitoring the height of vegetation provides important insights into forest age and habitat quality. These are essential for the accuracy of applications that are highly reliant on up-to-date and accurate vegetation data. Current vegetation sensing practices involve ground survey, photogrammetry, synthetic aperture radar (SAR), and airborne light detection and ranging sensors (LiDAR). While these methods provide high resolution and accuracy, their hardware and collection effort prohibits highly recurrent and widespread collection. In response to the limitations of current methods, we designed Y-NET, a novel deep learning model to generate high resolution models of vegetation from highly recurrent multispectral aerial imagery and elevation data. Y-NET’s architecture uses convolutional layers to learn correlations between different input features and vegetation height, generating an accurate vegetation surface model (VSM) at 1×1 m resolution. We evaluated Y-NET on 235 km...

Treepedia 2.0: Applying Deep Learning for Large-Scale Quantification of Urban Tree Cover

2018 IEEE International Congress on Big Data (BigData Congress), 2018

Recent advances in deep learning have made it possible to quantify urban metrics at fine resolution, and over large extents using street-level images. Here, we focus on measuring urban tree cover using Google Street View (GSV) images. First, we provide a small-scale labelled validation dataset and propose standard metrics to compare the performance of automated estimations of street tree cover using GSV. We apply state-of-the-art deep learning models, and compare their performance to a previously established benchmark of an unsupervised method. Our training procedure for deep learning models is novel; we utilize the abundance of openly available and similarly labelled street-level image datasets to pre-train our model. We then perform additional training on a small training dataset consisting of GSV images. We find that deep learning models significantly outperform the unsupervised benchmark method. Our semantic segmentation model increased mean intersection-over-union (IoU) from 44.10% to 60.42% relative to the unsupervised method and our end-to-end model decreased Mean Absolute Error from 10.04% to 4.67%. We also employ a recently developed method called gradient-weighted class activation map (Grad-CAM) to interpret the features learned by the end-to-end model. This technique confirms that the end-to-end model has accurately learned to identify tree cover area as key features for predicting percentage tree cover. Our paper provides an example of applying advanced deep learning techniques on a large-scale, geo-tagged and image-based dataset to efficiently estimate important urban metrics. The results demonstrate that deep learning models are highly accurate, can be interpretable, and can also be efficient in terms of data-labelling effort and computational resources.

Deep Building Footprint Extraction for Urban Risk Assessment – Remote Sensing and Deep Learning Based Approach

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Mapping building footprints can play a crucial role in urban dynamics monitoring , risk assessment and disaster management. Available free building footprints, like OpenStreetMap, provide manually annotated building footprint information for some urban areas; however, frequently it does not entirely cover urban areas in many parts of the world and is not always available. The huge potential for meaningful ground information extraction from high-resolution Remote Sensing imagery can be considered as an alternative and a reliable source of data for building footprint generation. Therefore, the aim of the study is to explore the use of satellite imagery data and some of the state-of-the art deep learning tools to fully automate building footprint extraction. To better understand the usability and generalization ability of those approaches, this study proposes a comparative analysis of the performances and characteristics of two of the most recent deep learning models such as Unet and Attention-Unet for building footprint generation.

Predicting Vegetation Stratum Occupancy from Airborne LiDAR Data with Deep Learning

2022

We propose a new deep learning-based method for estimating the occupancy of vegetation strata from airborne 3D LiDAR point clouds. Our model predicts rasterized occupancy maps for three vegetation strata corresponding to lower, medium, and higher cover. Our weakly-supervised training scheme allows our network to only be supervised with vegetation occupancy values aggregated over cylindrical plots containing thousands of points. Such ground truth is easier to produce than pixel-wise or point-wise annotations. Our method outperforms handcrafted and deep learning baselines in terms of precision by up to 30%, while simultaneously providing visual and interpretable predictions. We provide an open-source implementation along with a dataset of 199 agricultural plots to train and evaluate weakly supervised occupancy regression algorithms.

PCL–PTD Net: Parallel Cross-Learning-Based Pixel Transferred Deconvolutional Network for Building Extraction in Dense Building Areas With Shadow

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Urban building segmentation from remote sensed imageries is challenging because there usually exists a variety of building features. Furthermore, very high spatial resolution imagery can provide many details of the urban building, such as styles, small gaps among buildings, building shadows, etc. Hence, satisfactory accuracy in detecting and extracting urban features from highly detailed images still remains. Deep learning semantic segmentation using baseline networks works well on building extraction; however, their ability in building extraction in shadows area, unclear building feature, and narrow gaps among buildings in dense building zone is still limited. In this article, we propose parallel cross-learning-based pixel transferred deconvolutional network (PCL-PTD net), and then is used to segment urban buildings from aerial photographs. The proposed method is evaluated and intercompared with traditional baseline networks. In PCL-PTD net, it is composed of parallel network, cross-learning functions, residual unit in encoder part, and PTD in decoder part. The performance is applied to three datasets (Inria aerial dataset, international society for photogrammetry and remote sensing Potsdam dataset, and UAV building dataset), to evaluate its accuracy and robustness. As a result, we found that PCL-PTD net can improve learning capacities of the supervised learning model in differentiating buildings in dense area and extracting buildings covered by shadows. As compared to the baseline networks, we found that proposed network shows superior performance compared to all eight networks (SegNet, U-net, pyramid scene parsing network, PixelDCL, DeeplabV3+, U-Net++, context feature enhancement networ, and improved

Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks

ISPRS Journal of Photogrammetry and Remote Sensing, 2018

In this work, we investigate various methods to deal with semantic labeling of very high resolution multi-modal remote sensing data. Especially, we study how deep fully convolutional networks can be adapted to deal with multi-modal and multi-scale remote sensing data for semantic labeling. Our contributions are threefold: a) we present an efficient multi-scale approach to leverage both a large spatial context and the high resolution data, b) we investigate early and late fusion of Lidar and multispectral data, c) we validate our methods on two public datasets with state-of-the-art results. Our results indicate that late fusion make it possible to recover errors steaming from ambiguous data, while early fusion allows for better joint-feature learning but at the cost of higher sensitivity to missing data.