Leveraging Metadata in Representation Learning With Georeferenced Seafloor Imagery (original) (raw)

Learning features from georeferenced seafloor imagery with location guided autoencoders

Journal of Field Robotics, 2020

Although modern machine learning has the potential to greatly speed up the interpretation of imagery, the varied nature of the seabed and limited availability of expert annotations form barriers to its widespread use in seafloor mapping applications. This motivates research into unsupervised methods that function without large databases of human annotations. This paper develops an unsupervised feature learning method for georeferenced seafloor visual imagery that considers patterns both within the footprint of a single image frame and broader scale spatial characteristics. Features within images are learnt using an autoencoder developed based on the AlexNet deep convolutional neural network. Features larger than each image frame are learnt using a novel loss function that regularises autoencoder training using the Kullback-Leibler divergence function to loosely assume that images captured within a close distance of each other look more similar than those that are far away. The method is used to semantically interpret images taken by an autonomous underwater vehicle at the Southern Hydrates Ridge, an active gas hydrate field and site of a seafloor cabled observatory at a depth of 780 m. The method's performance when applied to clustering and content-based image retrieval is assessed against a ground truth consisting of more than 18,000 human annotations. The study shows that the location based loss function increases the rate of information retrieval by a factor of two for seafloor mapping applications. The effects of physics based colour correction and image rescaling are also investigated, showing that the improved consistency of spatial information achieved by rescaling is beneficial for recognising artificial objects such as cables and infrastructures, but is less effective for natural objects that have greater dimensional variability.

FathomNet: An underwater image training database for ocean exploration and discovery

ArXiv, 2020

Thousands of hours of marine video data are collected annually from remotely operated vehicles (ROVs) and other underwater assets. However, current manual methods of analysis impede the full utilization of collected data for real time algorithms for ROV and large biodiversity analyses. FathomNet is a novel baseline image training set, optimized to accelerate development of modern, intelligent, and automated analysis of underwater imagery. Our seed data set consists of an expertly annotated and continuously maintained database with more than 26,000 hours of videotape, 6.8 million annotations, and 4,349 terms in the knowledge base. FathomNet leverages this data set by providing imagery, localizations, and class labels of underwater concepts in order to enable machine learning algorithm development. To date, there are more than 80,000 images and 106,000 localizations for 233 different classes, including midwater and benthic organisms. Our experiments consisted of training various deep ...

BenthicNet: A global compilation of seafloor images for deep learning applications

arXiv (Cornell University), 2024

Advances in underwater imaging enable the collection of extensive seafloor image datasets that are necessary for monitoring important benthic ecosystems. The ability to collect seafloor imagery has outpaced our capacity to analyze it, hindering expedient mobilization of this crucial environmental information. Recent machine learning approaches provide opportunities to increase the efficiency with which seafloor image datasets are analyzed, yet large and consistent datasets necessary to support development of such approaches are scarce. Here we present BenthicNet: a global compilation of seafloor imagery designed to support the training and evaluation of large-scale image recognition models. An initial set of over 11.4 million images was collected and curated to represent a diversity of seafloor environments using a representative subset of 1.3 million images. These are accompanied by 2.6 million annotations translated to the CATAMI scheme, which span 190,000 of the images. A large deep learning model was trained on this compilation and preliminary results suggest it has utility for automating large and small-scale image analysis tasks. The compilation and model are made openly available for use by the scientific community at https://doi.org/10.20383/103.0614\. from image hosting websites. However, this dataset comprises terrestrial and anthropocentric objects and scenes, and does not represent subaqueous environments. The difference in the domain of the input data may limit the capacity for transfer learning. Development of large-scale models using compilations of benthic imagery that are suitable for transfer learning purposes would be ideal, yet this is made difficult by a lack of universal labels for seabed features. One of the primary difficulties associated with developing deep learning models in this context is that, unlike terrestrial and anthropocentric images, there is no objective label for many seabed habitats, biological communities, substrate types, or organisms. Indeed, a number of different classification schemes are used to label benthic features 33-35. Because no single vocabulary is universally applied to describe these features, we currently lack large sets of consistently labelled images that are necessary for training deep learning models for benthic environments. We note an outstanding need to develop standardized protocols for the translation of common marine image labelling schemes. Self-supervised learning (SSL) is a recent technique in which models can learn to understand their stimuli without the use of manually annotated data 36-44. Instead of using labelled data, self-supervised models learn to solve a pretext task that can be automatically constructed from the input data itself. SSL enables the training of large-scale models on unlabelled imagery, which can be collected at scale more easily than annotated imagery. Models trained with SSL have already learnt to see and understand the stimuli of interest, and can subsequently be used for transfer learning onto specific tasks, even if there is only a limited amount of annotated data available for the new task. SSL may enable the training of deep learning models on large-scale benthic image datasets for the purposes of transfer learning on smaller novel tasks (e.g. site-specific habitat labelling), despite the lack of large consistently labelled image datasets. Cumulatively, adequate volumes of benthic image data currently exist to support the development of SSL models, but they are spread globally among various research groups, government data portals, and open data repositories. There is a need to compile and curate datasets for the development of large-scale image recognition models. Such compilations must include images from a range of biomes, depths, and physical oceanographic conditions in order to adequately represent the global heterogeneity of benthic environments. Additionally, data should be included from an array of acquisition platforms and camera configurations to represent the variability in image characteristics (e.g. lighting, resolution, quality, perspective) that arise from non-standardized image data collection methods. The intended applications and scope for a benthic habitat machine learning image dataset dictate qualities that images should possess to be useful for automating tasks in this context. Unlike imagery that is focused solely on specific biota, benthic habitat images often depict a broader area (e.g. on the order of m 2), which necessarily includes the seafloor. The goal of analyzing such data is often to broadly categorize the benthic environment, potentially including both biotic and abiotic elements 45. Biotic characterization may include descriptions of individual organisms 46 or community composition 47 , while abiotic components include description of substrate, sediment bed forms, heterogeneity, rugosity, and relief 48-50. For these reasons, benthic habitat information is often summarized at the whole-image level-for example, by assigning one or several "habitat" labels to an entire image using a pre-defined scheme 34, 35, 51 , or by aggregating individual labels indicating presence or absence, abundance, or percentage cover of individual habitat components, which may be labelled using a more detailed vocabulary 33. It is therefore useful for benthic habitat images to depict a broad enough area so that both abiotic and biotic habitat components may be recognized. This may differ from other forms of marine image labelling that focus on locating specific objects, semantic labelling, bounding boxes, and masking. These forms of labelling are well suited to applications focusing on single taxa, pelagic biota, and object detection or tracking. Efforts to establish extensive image datasets for those applications are also underway 11, 52-61 , and several data portals and software packages support the labelling and centralization of data to support that work (e.g. CoralNet 11 , FathomNet 53 , SQUIDLE+ 62 , BIIGLE 63 , VIAME). Here we describe BenthicNet: a global compilation of seafloor images that is designed to support development of automated image processing tools for benthic habitat data. With this compilation, we strive to obtain thematic diversity by (i) compiling benthic habitat images from locations around the world, and (ii) representing habitats from a broad range of marine environments. The compiled dataset is assessed for these qualities. Additionally, we aim to achieve diversity of non-thematic image characteristics (e.g. image quality, lighting, perspective) by obtaining data from a range of acquisition platforms and camera configurations. The dataset is presented in three parts: a diverse collection of over 11 million seafloor images from around the world, provided without labels (BenthicNet-11M); a rarefied subset of 1.3 million images, selected to maintain diversity in the imagery while reducing redundancy and volume (BenthicNet-1M); and a collection of 188 688 labelled images bearing 2.6 million annotations (BenthicNet-Labelled). We provide a large SSL model pretrained on BenthicNet-1M, and demonstrate its application using examples from BenthicNet-Labelled. The compilation and SSL model are made openly available to foster further development and assessment of benthic image automation tools. 2 Methods In order to achieve a diverse collection of benthic habitat images for training deep learning models, data spanning a range of environments and geographies were obtained from a variety of sources. These initially included project partners and research 3/140 contacts, which were leveraged to establish additional data partnerships with individuals, academic and not-for-profit research groups, and government organizations. The largest data volumes were eventually obtained from several academic, government, and third-party public data repositories. The acquisition of labelled data was prioritized in all cases, but extensive high quality unlabelled data collections were also included where feasible. The desired format for each dataset was a single folder containing unique images, accompanied by a single comma separated value (CSV) file indicating, at a minimum, the dataset, file name, latitude, longitude, date and time of acquisition, URL (if hosted online) and label(s) (if provided) for each image. 2.1 Data compilation and quality control Labelled benthic image data was initially obtained from project collaborators, data partners, and opportunistic sources such as academic journal supplementary materials. The formats and varieties of data were diverse, including collections of images with spreadsheet metadata, images with metadata contained in file names, GIS files containing images from which metadata was extracted, lists of URL image links, and raw video with text file annotations. Datasets that were not formatted as a single folder of images or list of URL links with CSV metadata were re-formatted upon receipt. Metadata contained in image file names was parsed and used to construct a metadata CSV file where necessary. Image data contained within GIS files was extracted using ArcGIS Pro and the ArcPy Python package, along with geographic information and other metadata contained within the files. All geographic coordinates were converted to decimal degrees using the WGS 84 datum. Data obtained as video files were subsampled by extracting still frames according to their metadata using FFmpeg. After formatting, all datasets were subjected to quality control checks for missing entries, duplicates, label consistency, image quality, and matches between images and metadata. Data columns were renamed to match a standardized format for the BenthicNet dataset. All quality control and formatting was completed using R and Python. The dataset sources are summarized in Table 1. Additional detail on the individual datasets is provided in Appendix C. 2.1.1 Individual contributions A number of datasets...

GeoCLR: Georeference Contrastive Learning for Efficient Seafloor Image Interpretation

Field Robotics

This paper describes georeference contrastive learning of visual representation (GeoCLR) for efficient training of deep-learning convolutional neural networks (CNNs). The method leverages georeference information by generating a similar image pair using images taken of nearby locations, and contrasting these with an image pair that is far apart. The underlying assumption is that images gathered within a close distance are more likely to have similar visual appearance, where this can be reasonably satisfied in seafloor robotic imaging applications where image footprints are limited to edge lengths of a few meters and are taken so that they overlap along a vehicle’s trajectory, whereas seafloor substrates and habitats have patch sizes that are far larger. A key advantage of this method is that it is self-supervised and does not require any human input for CNN training. The method is computationally efficient, where results can be generated between dives during multi-day autonomous und...

FathomNet: A global underwater image training set for enabling artificial intelligence in the ocean

ArXiv, 2021

Ocean-going platforms are integrating high-resolution camera feeds for observation and navigation, producing a deluge of visual data. The volume and rate of this data collection can rapidly outpace researchers’ abilities to process and analyze them. Recent advances in machine learning enable fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data set standardization, insufficient formatting, and aggregation of existing, expertly curated imagery for use by data scientists. To address this need, we have built FathomNet, a public platform that makes use of existing, expertly curated data. Initial efforts have leveraged MBARI’s Video Annotation and Reference System and annotated deep sea video database, which has more than 7M annotations, 1M frame grabs, and 5k terms in the knowledgebase, with additional contributions by National Geographic Society (NGS) and NOAA’s Office of Ocean Exploration and Research. FathomNet has over 160k locali...

FathomNet: A global image database for enabling artificial intelligence in the ocean

Scientific Reports

The ocean is experiencing unprecedented rapid change, and visually monitoring marine biota at the spatiotemporal scales needed for responsible stewardship is a formidable task. As baselines are sought by the research community, the volume and rate of this required data collection rapidly outpaces our abilities to process and analyze them. Recent advances in machine learning enables fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data standardization, insufficient formatting, and demand for large, labeled datasets. To address this need, we built FathomNet, an open-source image database that standardizes and aggregates expertly curated labeled data. FathomNet has been seeded with existing iconic and non-iconic imagery of marine animals, underwater equipment, debris, and other concepts, and allows for future contributions from distributed data sources. We demonstrate how FathomNet data can be used to train and deploy models on other...

A Comparison of the Performance of 2D and 3D Convolutional Neural Networks for Subsea Survey Video Classification

2021

Utilising deep learning image classification to automatically annotate subsea pipeline video surveys can facilitate the tedious and labour-intensive process, resulting in significant time and cost savings. However, the classification of events on subsea survey videos (frame sequences) by models trained on individual frames have been proven to vary, leading to inaccuracies. The paper extends previous work on the automatic annotation of individual subsea survey frames by comparing the performance of 2D and 3D Convolutional Neural Networks (CNNs) in classifying frame sequences. The study explores the classification of burial, exposure, free span, field joint, and anode events. Sampling and regularization techniques are designed to address the challenges of an underwater inspection video dataset owing to the environment. Results show that a 2D CNN with rolling average can outperform a 3D CNN, achieving an Exact Match Ratio of 85% and F1-Score of 90%, whilst being more computationally ef...

Transferring Deep Knowledge for Object Recognition in Low-quality Underwater Videos

Neurocomputing

In recent years, underwater video technologies allow us to explore the ocean in scientific and noninvasive ways, such as environmental monitoring, marine ecology studies, and fisheries management. However the low-light and high-noise scenarios pose great challenges for the underwater image and video analysis. We here propose a CNN knowledge transfer framework for underwater object recognition and tackle the problem of extracting discriminative features from relatively low contrast images. Even with the insufficient training set, the transfer framework can well learn a recognition model for the special underwater object recognition task together with the help of data augmentation. For better identifying objects from an underwater video, a weighted probabilities decision mechanism is introduced to identify the object from a series of frames. The proposed framework can be implemented for real-time underwater object recognition on autonomous underwater vehicles and video monitoring systems. To verify the effectiveness of our method, experiments on a public dataset are carried out. The results show that the proposed method achieves promising results for underwater object recognition on both test image datasets and underwater videos.

Workflows for Automated Detection and Classification of Unlabeled Deep Sea Imagery

2017

Over the past 28 years, professional video annotators at the Monterey Bay Aquarium Research Institute (MBARI) have recorded over 5.5 million observations throughout a collection of over 23,000 hours of video footage. MBARI researchers and scientists query these observations through the Video Annotation and Reference System (VARS) to conduct oceanographic research. However, recording these observations requires a lot of time, energy, and knowledge from MBARI’s professional video annotators. In addition, due to the ever increasing rate of incoming imagery, an efficient automated detection and classification system would be of great assistance to the upkeep of the VARS database. Because MBARI’s 5.5 million observations are currently unable to be used to train deep learning object class detectors, we explore various workflows to create these systems from unlabeled data. We find that combining deep learning algorithms and various annotation methods in a bootstrapping approach can produce...