Automatic Pixelwise Object Labeling for Aerial Imagery Using Stacked U-Nets (original) (raw)
Related papers
Semantic Segmentation of Aerial Images Using U-Net Architecture
Iraqi Journal for Electrical and Electronic Engineering, 2022
Arial images are very high resolution. The automation for map generation and semantic segmentation of aerial images are challenging problems in semantic segmentation. The semantic segmentation process does not give us precise details of the remote sensing images due to the low resolution of the aerial images. Hence, we propose an algorithm U-Net Architecture to solve this problem. It is classified into two paths. The compression path (also called: the encoder) is the first path and is used to capture the image’s context. The encoder is just a convolutional and maximal pooling layer stack. The symmetric expanding path (also called: the decoder) is the second path, which is used to enable exact localization by transposed convolutions. This task is commonly referred to as dense prediction, which is completely connected to each other and also with the former neurons which gives rise to dense layers. Thus it is an end-to-end fully convolutional network (FCN), i.e. it only contains convol...
Aerial pictures semantic segmentation applying deep learning
International Journal Of Trendy Research In Engineering And Technology, 2021
An obvious expansion in the measure of satellite dataset accessible lately has made the translation of this information a difficult issue at scale. Determining helpful insights from such pictures requires a rich comprehension of the data present in them. AI is currently utilized for keeping up precise automated regional maps to react to real time, natural and catastrophe recuperation challenges. These assignments need close to continuous, precise, mechanized planning straight from aerial and satellite pictures. In this project, we apply Mask-RCNN and Conditional Adversarial Network techniques for extracting building footprint. The problem is viewed as a supervised learning problem. We try different things with learning parameters and algorithms, apply data augmentation, use transfer learning, utilizing RGB data and to accomplish high precision results. The resulting pipeline incorporates image pre-processing algorithms that permits it to adapt to input pictures of fluctuating quality, resolution and channels.
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Class imbalance is a serious problem that disrupts the process of semantic segmentation of satellite imagery in urban areas in Earth remote sensing. Due to the large objects dominating the segmentation process, small object are consequently limited, so solutions based on optimizing overall accuracy are often unsatisfactory. Due to the class imbalance of semantic segmentation in Earth remote sensing images in urban areas, we developed the concept of Down-Sampling Block (DownBlock) to obtain contextual information and Up-Sampling Block (UpBlock) to restore the original resolution. We proposed an end-to-end deep convolutional neural network (DenseU-Net) architecture for pixel-wise urban remote sensing image segmentation. this method to segmentation the small object in satellite imagery.The accuracy of the small object class in this study was further improved using our proposed method. This study used data from the Massachusetts Buildings dataset using Dense U-Net method and obtained an...
IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, 2018
Over the recent years, there has been an increasing interest in large-scale classification of remote sensing images. In this context, the Inria Aerial Image Labeling Benchmark has been released online in December 2016. In this paper, we discuss the outcomes of the first year of the benchmark contest, which consisted in dense labeling of aerial images into building / not building classes, covering areas of five cities not present in the training set. We present four methods with the highest numerical accuracies, all four being convolutional neural network approaches. It is remarkable that three of these methods use the U-net architecture, which has thus proven to become a new standard in image dense labeling.
Detecting Spatial Information from Satellite Imagery using Deep Learning for Semantic Segmentation
Detecting Spatial Information from Satellite Imagery using Deep Learning for Semantic Segmentation, 2023
Detecting spatial information from satellite imagery using deep learning for semantic segmentation is an important field that is significantly growing due to its importance in applications such as the automated generation of vector maps, urban planning, and geographic information systems. In this research, the utilization of deep learning for the semantic segmentation of spatial information from satellite imagery is explored. The objective is to devise an efficient and precise method for detecting and categorizing diverse features on the Earth's surface, including road networks, building footprints, water bodies, vegetation, and land cover which can be used in automatic map production. The proposed technique entails training a deep convolutional neural network to detect spatial features from a small dataset of satellite imagery, followed by a segmentation process to classify the various spatial features. This study conducts various experiments on satellite imagery to achieve high accuracy rates that outperform traditional image processing techniques. In addition, this project also compares various models such as networks with U-shaped architecture U-Net and modified U-Net (Inception ResNetV2U-Net) with various spatial features. Both Implemented models achieved higher results than other relevant research papers. Although the Inception ResNetV2U-Net model produced slightly better results than U-Net, with a validation accuracy of 87.5% and a validation coefficient of 87%, the U-Net model achieved also high validation accuracy and coefficient of 86.5% and 84%, respectively. Additionally, the U-Net model exhibited significantly improved and better training and validation loss than ResNetV2U-Net. Furthermore, the U-Net model showed a shorter average prediction time of satellite imagery. Therefore, the U-Net model is proven to be more suitable for detecting spatial information from small satellite datasets.
ARC-Net: An Efficient Network for Building Extraction From High-Resolution Aerial Images
IEEE Access
Automatic building extraction based on high-resolution aerial images has important applications in urban planning and environmental management. In recent years advances and performance improvements have been achieved in building extraction through the use of deep learning methods. However, the design of existing models focuses attention to improve accuracy through an overflowing number of parameters and complex structure design, resulting in large computational costs during the learning phase and low inference speed. To address these issues, we propose a new, efficient end-to-end model, called ARC-Net. The model includes residual blocks with asymmetric convolution (RBAC) to reduce the computational cost and to shrink the model size. In addition, dilated convolutions and multi-scale pyramid pooling modules are utilized to enlarge the receptive field and to enhance accuracy. We verify the performance and efficiency of the proposed ARC-Net on the INRIA Aerial Image Labeling dataset and WHU building dataset. Compared to available deep learning models, the proposed ARC-Net demonstrates better segmentation performance with less computational costs. This indicates that the proposed ARC-Net is both effective and efficient in automatic building extraction from high-resolution aerial images.
Sensors
Building segmentation is crucial for applications extending from map production to urban planning. Nowadays, it is still a challenge due to CNNs’ inability to model global context and Transformers’ high memory need. In this study, 10 CNN and Transformer models were generated, and comparisons were realized. Alongside our proposed Residual-Inception U-Net (RIU-Net), U-Net, Residual U-Net, and Attention Residual U-Net, four CNN architectures (Inception, Inception-ResNet, Xception, and MobileNet) were implemented as encoders to U-Net-based models. Lastly, two Transformer-based approaches (Trans U-Net and Swin U-Net) were also used. Massachusetts Buildings Dataset and Inria Aerial Image Labeling Dataset were used for training and evaluation. On Inria dataset, RIU-Net achieved the highest IoU score, F1 score, and test accuracy, with 0.6736, 0.7868, and 92.23%, respectively. On Massachusetts Small dataset, Attention Residual U-Net achieved the highest IoU and F1 scores, with 0.6218 and 0.7...
Deep Learning for Understanding Satellite Imagery: An Experimental Survey
Frontiers in Artificial Intelligence, 2020
Translating satellite imagery into maps requires intensive effort and time, especially leading to inaccurate maps of the affected regions during disaster and conflict. The combination of availability of recent datasets and advances in computer vision made through deep learning paved the way toward automated satellite image translation. To facilitate research in this direction, we introduce the Satellite Imagery Competition using a modified SpaceNet dataset. Participants had to come up with different segmentation models to detect positions of buildings on satellite images. In this work, we present five approaches based on improvements of U-Net and Mask R-Convolutional Neuronal Networks models, coupled with unique training adaptations using boosting algorithms, morphological filter, Conditional Random Fields and custom losses. The good results—as high as AP=0.937 and AR=0.959—from these models demonstrate the feasibility of Deep Learning in automated satellite image annotation.
U-Net Utilization on segmentation of Aerial Captured Images
IEEE, 2023
A Convolutional Neural Network based on U-Net architecture was implement for object segmentation on Aerial captured Images. The architecture was modified by adding several dropout layers. The combination of several activation function, optimizer, and loss function was tested on the network, and the best combination giving Sorensen Dice Coefficient score of 0.86656 for the test set.
Semantic segmentation of high-resolution aerial imagery using a fully convolutional network
HAL (Le Centre pour la Communication Scientifique Directe), 2022
Semantic segmentation applied to aerial imagery allows the extraction of terrestrial objects such as roads, buildings and even vegetation. Having large, detailed datasets of navigable roads, is of paramount importance in several application fields; namely urban planning, automatic navigation, disaster management. To reach this goal, extracting all roads in a given territory area is the first step. This paper presents a modern method to semantically segment aerial images for a road network extraction. We employ an encoder-decoder architecture to approach the problem of disconnected road regions faced by some existing methods. Using an FCN approach, the localization information was combined to the semantic one, to enable the reconstruction of the road by the proposed model, while being consistent with following the spatial alignment. The method was implemented and evaluated on the public dataset Massassuchets Roads. Results appear to be in full agreement with the theorical predictions and a significant improvement in road connectivity over some previous works; the proposed network achieved a precision of 87.86% and a recall of 87.89%.