Multi-Scale Feature Map Aggregation and Supervised Domain Adaptation of Fully Convolutional Networks for Urban Building Footprint Extraction (original) (raw)
Related papers
2021
Urban areas consume over two-thirds of the world’s energy and account for more than 70% of global CO2 emissions. As stated in IPCC’s Global Warming of 1.5 oC report, achieving carbon neutrality by 2050 requires a scalable approach that can be applied in a global context. Conventional methods of collecting data on energy use and emissions of buildings are extremely expensive and require specialized geometry information that not all cities have readily available. High-quality building footprint generation from satellite images can accelerate this predictive process and empower municipal decision-making at scale. However, previous deep learning-based approaches use supplemental data such as point cloud data, building height information, and multi-band imagery which has limited availability and is difficult to produce. In this paper, we propose a modified DeeplabV3+ module with a Dilated ResNet backbone to generate masks of building footprints from only three-channel RGB satellite image...
Building Footprint Segmentation Using Transfer Learning: A Case Study of the City of Melbourne
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Earth observation data including very high-resolution (VHR) imagery from satellites and unmanned aerial vehicles (UAVs) are the primary sources for highly accurate building footprint segmentation and extraction. However, with the increase in spatial resolution, smaller objects are prominently visible in the images, and using intelligent approaches like deep learning (DL) suffers from several problems. In this paper, we outline four prominent problems while using DL-based methods (P1, P2, P3, and P4)): (P1) lack of contextual features, (P2) requirement of a large training dataset, (P3) domain-shift problem, and (P4) computational expense. In tackling P1, we modify a commonly used DL architecture called U-Net to increase the contextual feature information. Likewise, for P2 and P3, we use transfer learning to fine-tune the DL model on a smaller dataset utilising the knowledge previously gained from a larger dataset. For P4, we study the trade-off between the network's performance and computational expense with reduced training parameters and optimum learning rates. Our experiments on a case study from the City of Melbourne show that the modified U-Net is highly robust than the original U-Net and SegNet, and the dataset we develop is significantly more robust than an existing benchmark dataset. Furthermore, the overall method of fine-tuning the modified U-Net reduces the number of training parameters by 300 times and training time by 2.5 times while preserving the precision of segmentation.
A Novel Adaptive Deep Network for Building Footprint Segmentation
ArXiv, 2021
Building footprint segmentations for high resolution images are increasingly demanded for many remote sensing applications. By the emerging deep learning approaches, segmentation networks have made significant advances in the semantic segmentation of objects. However, these advances and the increased access to satellite images require the generation of accurate object boundaries in satellite images. In the current paper, we propose a novel network based on Pix2Pix methodology to solve the problem of inaccurate boundaries obtained by converting satellite images into maps using segmentation networks in order to segment building footprints. To define the new network named G2G, our framework includes two generators where the first generator extracts localization features in order to merge them with the boundary features extracting from the second generator to segment all detailed building edges. Moreover, different strategies are implemented to enhance the quality of the proposed networ...
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Mapping building footprints can play a crucial role in urban dynamics monitoring , risk assessment and disaster management. Available free building footprints, like OpenStreetMap, provide manually annotated building footprint information for some urban areas; however, frequently it does not entirely cover urban areas in many parts of the world and is not always available. The huge potential for meaningful ground information extraction from high-resolution Remote Sensing imagery can be considered as an alternative and a reliable source of data for building footprint generation. Therefore, the aim of the study is to explore the use of satellite imagery data and some of the state-of-the art deep learning tools to fully automate building footprint extraction. To better understand the usability and generalization ability of those approaches, this study proposes a comparative analysis of the performances and characteristics of two of the most recent deep learning models such as Unet and Attention-Unet for building footprint generation.
Sensors
Building segmentation is crucial for applications extending from map production to urban planning. Nowadays, it is still a challenge due to CNNs’ inability to model global context and Transformers’ high memory need. In this study, 10 CNN and Transformer models were generated, and comparisons were realized. Alongside our proposed Residual-Inception U-Net (RIU-Net), U-Net, Residual U-Net, and Attention Residual U-Net, four CNN architectures (Inception, Inception-ResNet, Xception, and MobileNet) were implemented as encoders to U-Net-based models. Lastly, two Transformer-based approaches (Trans U-Net and Swin U-Net) were also used. Massachusetts Buildings Dataset and Inria Aerial Image Labeling Dataset were used for training and evaluation. On Inria dataset, RIU-Net achieved the highest IoU score, F1 score, and test accuracy, with 0.6736, 0.7868, and 92.23%, respectively. On Massachusetts Small dataset, Attention Residual U-Net achieved the highest IoU and F1 scores, with 0.6218 and 0.7...
A Multi-Task Deep Learning Framework for Building Footprint Segmentation
2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021
The task of building footprint segmentation has been wellstudied in the context of remote sensing (RS) as it provides valuable information in many aspects, however, difficulties brought by the nature of RS images such as variations in the spatial arrangements and inconsistent constructional patterns require studying further, since it often causes poorly classified segmentation maps. We address this need by designing a joint optimization scheme for the task of building footprint delineation and introducing two auxiliary tasks; image reconstruction and building footprint boundary segmentation with the intent to reveal the common underlying structure to advance the classification accuracy of a single task model under the favor of auxiliary tasks. In particular, we propose a deep multi-task learning (MTL) based unified fully convolutional framework which operates in an end-to-end manner by making use of joint loss function with learnable loss weights considering the homoscedastic uncertainty of each task loss. Experimental results conducted on the SpaceNet6 dataset demonstrate the potential of the proposed MTL framework as it improves the classification accuracy greatly compared to single-task and lesser compounded tasks.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Building semantic segmentation is an exceedingly important issue in the field of remote sensing. A new building dataset as created consisting of very high-resolution optical satellite images provided by the Center for Satellite Communications and Remote Sensing (CSCRS). The imagery is obtained by Pleiades satellite and have a resolution of 0.5 meters. Segmentation results have been obtained using post-FCN architectures. Architectures examined in this work fall under one of few categories. The first category is Encoder-Decoder Network: an encoder that reduces the spatial resolution of the data and a decoder that recreates the lower resolution result of the encoder and upsamples it. The second category is Feature Pyramid Network, in this type of network scene information is aggregated across pyramid structures which produce more comprehensive results. The third category is Dilated Network, due to its atrous structure, which can calculate any layer at any desired resolution, with the presence of holes in the filter. The final category is Attention-Based Network, in these networks, certain aspects of the data are emphasized while other aspects are ignored. After this work, it can be seen that according to several metrics Dilated and Attention-Based Networks perform better than their counterparts. As a result of the training of 100 epochs with the data set in architectures belonging to Dilated and Attention-Based Networks, IoU values above 0.90 were obtained.
ARC-Net: An Efficient Network for Building Extraction From High-Resolution Aerial Images
IEEE Access
Automatic building extraction based on high-resolution aerial images has important applications in urban planning and environmental management. In recent years advances and performance improvements have been achieved in building extraction through the use of deep learning methods. However, the design of existing models focuses attention to improve accuracy through an overflowing number of parameters and complex structure design, resulting in large computational costs during the learning phase and low inference speed. To address these issues, we propose a new, efficient end-to-end model, called ARC-Net. The model includes residual blocks with asymmetric convolution (RBAC) to reduce the computational cost and to shrink the model size. In addition, dilated convolutions and multi-scale pyramid pooling modules are utilized to enlarge the receptive field and to enhance accuracy. We verify the performance and efficiency of the proposed ARC-Net on the INRIA Aerial Image Labeling dataset and WHU building dataset. Compared to available deep learning models, the proposed ARC-Net demonstrates better segmentation performance with less computational costs. This indicates that the proposed ARC-Net is both effective and efficient in automatic building extraction from high-resolution aerial images.
IEEE Access
Automatic extraction of buildings from remote sensing imagery plays a significant role in many applications, such as urban planning and monitoring changes to land cover. Various building segmentation methods have been proposed for visible remote sensing images, especially state-of-the-art methods based on convolutional neural networks (CNNs). However, high-accuracy building segmentation from high-resolution remote sensing imagery is still a challenging task due to the potentially complex texture of buildings in general and image background. Repeated pooling and striding operations used in CNNs reduce feature resolution causing a loss of detailed information. To address this issue, we propose a lightweight deep learning model integrating spatial pyramid pooling with an encoder-decoder structure. The proposed model takes advantage of a spatial pyramid pooling module to capture and aggregate multi-scale contextual information and of the ability of encoder-decoder networks to restore losses of information. The proposed model is evaluated on two publicly available datasets; the Massachusetts roads and buildings dataset and the INRIA Aerial Image Labeling Dataset. The experimental results on these datasets show qualitative and quantitative improvement against established image segmentation models, including SegNet, FCN, U-Net, Tiramisu, and FRRN. For instance, compared to the standard U-Net, the overall accuracy gain is 1.0% (0.913 vs. 0.904) and 3.6% (0.909 vs. 0.877) with a maximal increase of 3.6% in model-training time on these two datasets. These results demonstrate that the proposed model has the potential to deliver automatic building segmentation from high-resolution remote sensing images at an accuracy that makes it a useful tool for practical application scenarios.
Boundary Regularized Building Footprint Extraction from Satellite Images Using Deep Neural Networks
ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020
In recent years, an ever-increasing number of remote satellites are orbiting the Earth which streams vast amount of visual data to support a wide range of civil, public and military applications. One of the key information obtained from satellite imagery is to produce and update spatial maps of built environment due to its wide coverage with high resolution data. However, reconstructing spatial maps from satellite imagery is not a trivial vision task as it requires reconstructing a scene or object with high-level representation such as primitives. For the last decade, significant advancement in object detection and representation using visual data has been achieved, but the primitive-based object representation still remains as a challenging vision task. Thus, a high-quality spatial map is mainly produced through complex labour-intensive processes. In this paper, we propose a novel deep neural network, which enables to jointly detect building instance and regularize noisy building boundary shapes from a single satellite imagery. The proposed deep learning method consists of a two-stage object detection network to produce region of interest (RoI) features and a building boundary extraction network using graph models to learn geometric information of the polygon shapes. Extensive experiments show that our model can accomplish multi-tasks of object localization, recognition, semantic labelling and geometric shape extraction simultaneously. In terms of building extraction accuracy, computation efficiency and boundary regularization performance, our model outperforms the state-of-the-art baseline models.