Scalable Retrieval of Similar Landscapes in Optical Satellite Imagery Using Unsupervised Representation Learning (original) (raw)
Related papers
Self-Supervision, Remote Sensing and Abstraction: Representation Learning Across 3 Million Locations
ArXiv, 2022
Self-supervision based deep learning classification approaches have received considerable attention in academic literature. However, the performance of such methods on remote sensing imagery domains remains under-explored. In this work, we explore contrastive representation learning methods on the task of imagery-based city classification, an important problem in urban computing. We use satellite and map imagery across 2 domains, 3 million locations and more than 1500 cities. We show that self-supervised methods can build a generalizable representation from as few as 200 cities, with representations achieving over 95% accuracy in unseen cities with minimal additional training. We also find that the performance discrepancy of such methods, when compared to supervised methods, induced by the domain discrepancy between natural imagery and abstract imagery is significant for remote sensing imagery. We compare all analysis against existing supervised models from academic literature and open-source our models 1 for broader usage and further criticism.
Remote Sensing
Remote sensing data has been widely used for various Earth Observation (EO) missions such as land use and cover classification, weather forecasting, agricultural management, and environmental monitoring. Most existing remote-sensing-data-based models are based on supervised learning that requires large and representative human-labeled data for model training, which is costly and time-consuming. The recent introduction of self-supervised learning (SSL) enables models to learn a representation from orders of magnitude more unlabeled data. The success of SSL is heavily dependent on a pre-designed pretext task, which introduces an inductive bias into the model from a large amount of unlabeled data. Since remote sensing imagery has rich spectral information beyond the standard RGB color space, it may not be straightforward to extend to the multi/hyperspectral domain the pretext tasks established in computer vision based on RGB images. To address this challenge, this work proposed a gener...
In-domain representation learning for remote sensing
ArXiv, 2019
Given the importance of remote sensing, surprisingly little attention has been paid to it by the representation learning community. To address it and to establish baselines and a common evaluation protocol in this domain, we provide simplified access to 5 diverse remote sensing datasets in a standardized form. Specifically, we investigate in-domain representation learning to develop generic remote sensing representations and explore which characteristics are important for a dataset to be a good source for remote sensing representation learning. The established baselines achieve state-of-the-art performance on these datasets.
Accelerating Ecological Sciences from Above: Spatial Contrastive Learning for Remote Sensing
2021
The rise of neural networks has opened the door for automatic analysis of remote sensing data. A challenge to using this machinery for computational sustainability is the necessity of massive labeled data sets, which can be cost-prohibitive for many non-profit organizations. The primary motivation for this work is one such problem; the efficient management of invasive species – invading flora and fauna that are estimated to cause damages in the billions of dollars annually. As an ongoing collaboration with the New York Natural Heritage Program, we consider the use of unsupervised deep learning techniques for dimensionality reduction of remote sensing images, which can reduce sample complexity for downstream tasks and decreases the need for large labeled data sets. We consider spatially augmenting contrastive learning by training neural networks to correctly classify two nearby patches of a landscape as such. We demonstrate that this approach improves upon previous methods and naive ...
Deep Unsupervised Embedding for Remote Sensing Image Retrieval Using Textual Cues
Applied Sciences
Compared to image-image retrieval, text-image retrieval has been less investigated in the remote sensing community, possibly because of the complexity of appropriately tying textual data to respective visual representations. Moreover, a single image may be described via multiple sentences according to the perception of the human labeler and the structure/body of the language they use, which magnifies the complexity even further. In this paper, we propose an unsupervised method for text-image retrieval in remote sensing imagery. In the method, image representation is obtained via visual Big Transfer (BiT) Models, while textual descriptions are encoded via a bidirectional Long Short-Term Memory (Bi-LSTM) network. The training of the proposed retrieval architecture is optimized using an unsupervised embedding loss, which aims to make the features of an image closest to its corresponding textual description and different from other image features and vise-versa. To demonstrate the perfo...
Investigations on Feature Similarity and the Impact of Training Data for Land Cover Classification
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2021
Fully convolutional neural networks (FCN) are successfully used for pixel-wise land cover classification-the task of identifying the physical material of the Earth's surface for every pixel in an image. The acquisition of large training datasets is challenging, especially in remote sensing, but necessary for a FCN to perform well. One way to circumvent manual labelling is the usage of existing databases, which usually contain a certain amount of label noise when combined with another data source. As a first part of this work, we investigate the impact of training data on a FCN. We experiment with different amounts of training data, varying w.r.t. the covered area, the available acquisition dates and the amount of label noise. We conclude that the more data is used for training, the better is the generalization performance of the model, and the FCN is able to mitigate the effect of label noise to a high degree. Another challenge is the imbalanced class distribution in most real-world datasets, which can cause the classifier to focus on the majority classes, leading to poor classification performance for minority classes. To tackle this problem, in this paper, we use the cosine similarity loss to force feature vectors of the same class to be close to each other in feature space. Our experiments show that the cosine loss helps to obtain more similar feature vectors, but the similarity of the cluster centers also increases.
A Generalizable and Accessible Approach to Machine Learning with Global Satellite Imagery
Social Science Research Network, 2020
Combining satellite imagery with machine learning (SIML) has the potential to address global challenges by remotely estimating socioeconomic and environmental conditions in data-poor regions, yet the resource requirements of SIML limit its accessibility and use. We show that a single encoding of satellite imagery can generalize across diverse prediction tasks (e.g. forest cover, house price, road length). Our method achieves accuracy competitive with deep neural networks at orders of magnitude lower computational cost, scales globally, delivers label superresolution predictions, and facilitates characterizations of uncertainty. Since image encodings are shared across tasks, they can be centrally computed and distributed to unlimited researchers, who need only fit a linear regression to their own ground truth data in order to achieve state-of-the-art SIML performance.
Benchmarking Representation Learning for Natural World Image Collections
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Recent progress in self-supervised learning has resulted in models that are capable of extracting rich representations from image collections without requiring any explicit label supervision. However, to date the vast majority of these approaches have restricted themselves to training on standard benchmark datasets such as ImageNet. We argue that fine-grained visual categorization problems, such as plant and animal species classification, provide an informative testbed for self-supervised learning. In order to facilitate progress in this area we present two new natural world visual classification datasets, iNat2021 and NeWT. The former consists of 2.7M images from 10k different species uploaded by users of the citizen science application iNaturalist. We designed the latter, NeWT, in collaboration with domain experts with the aim of benchmarking the performance of representation learning algorithms on a suite of challenging natural world binary classification tasks that go beyond standard species classification. These two new datasets allow us to explore questions related to large-scale representation and transfer learning in the context of finegrained categories. We provide a comprehensive analysis of feature extractors trained with and without supervision on ImageNet and iNat2021, shedding light on the strengths and weaknesses of different learned features across a diverse set of tasks. We find that features produced by standard supervised methods still outperform those produced by self-supervised approaches such as SimCLR. However, improved self-supervised learning methods are constantly being released and the iNat2021 and NeWT datasets are a valuable resource for tracking their progress.
A Multiscale Deeply Described Correlatons-Based Model for Land-Use Scene Classification
Remote Sensing, 2017
Research efforts in land-use scene classification is growing alongside the popular use of High-Resolution Satellite (HRS) images. The complex background and multiple land-cover classes or objects, however, make the classification tasks difficult and challenging. This article presents a Multiscale Deeply Described Correlatons (MDDC)-based algorithm which incorporates appearance and spatial information jointly at multiple scales for land-use scene classification to tackle these problems. Specifically, we introduce a convolutional neural network to learn and characterize the dense convolutional descriptors at different scales. The resulting multiscale descriptors are used to generate visual words by a general mapping strategy and produce multiscale correlograms of visual words. Then, an adaptive vector quantization of multiscale correlograms, termed multiscale correlatons, are applied to encode the spatial arrangement of visual words at different scales. Experiments with two publicly available land-use scene datasets demonstrate that our MDDC model is discriminative for efficient representation of land-use scene images, and achieves competitive classification results with state-of-the-art methods.
Unsupervised Deep Features for Remote Sensing Image Matching via Discriminator Network
2018
The advent of deep perceptual networks brought about a paradigm shift in machine vision and image perception. Image apprehension lately carried out by hand-crafted features in the latent space have been replaced by deep features acquired from supervised networks for improved understanding. However, such deep networks require strict supervision with a substantial amount of the labeled data for authentic training process. These methods perform poorly in domains lacking labeled data especially in case of remote sensing image retrieval. Resolving this, we propose an unsupervised encoder-decoder feature for remote sensing image matching (RSIM). Moreover, we replace the conventional distance metrics with a deep discriminator network to identify the similarity of the image pairs. To the best of our knowledge, discriminator network has never been used before for solving RSIM problem. Results have been validated with two publicly available benchmark remote sensing image datasets. The techniq...