FathomNet: An underwater image training database for ocean exploration and discovery (original) (raw)

FathomNet: A global underwater image training set for enabling artificial intelligence in the ocean

ArXiv, 2021

Ocean-going platforms are integrating high-resolution camera feeds for observation and navigation, producing a deluge of visual data. The volume and rate of this data collection can rapidly outpace researchers’ abilities to process and analyze them. Recent advances in machine learning enable fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data set standardization, insufficient formatting, and aggregation of existing, expertly curated imagery for use by data scientists. To address this need, we have built FathomNet, a public platform that makes use of existing, expertly curated data. Initial efforts have leveraged MBARI’s Video Annotation and Reference System and annotated deep sea video database, which has more than 7M annotations, 1M frame grabs, and 5k terms in the knowledgebase, with additional contributions by National Geographic Society (NGS) and NOAA’s Office of Ocean Exploration and Research. FathomNet has over 160k locali...

FathomNet: A global image database for enabling artificial intelligence in the ocean

Scientific Reports

The ocean is experiencing unprecedented rapid change, and visually monitoring marine biota at the spatiotemporal scales needed for responsible stewardship is a formidable task. As baselines are sought by the research community, the volume and rate of this required data collection rapidly outpaces our abilities to process and analyze them. Recent advances in machine learning enables fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data standardization, insufficient formatting, and demand for large, labeled datasets. To address this need, we built FathomNet, an open-source image database that standardizes and aggregates expertly curated labeled data. FathomNet has been seeded with existing iconic and non-iconic imagery of marine animals, underwater equipment, debris, and other concepts, and allows for future contributions from distributed data sources. We demonstrate how FathomNet data can be used to train and deploy models on other...

FathomNet: An Open, Underwater Image Repository for Automated Detection and Classification of Midwater and Benthic Objects

Marine Technology Society Journal, 2021

Ocean-going platforms and instruments are integrating cameras for observation and navigation, producing a deluge of visual data. The volume of this data collection can rapidly outpace researchers' abilities to process and analyze them. Recent advances in artificial intelligence enable fast, sophisticated analysis of visual data, but have had limited success in the oceanographic world due to lack of dataset standardization, sparse annotation tools, and insufficient formatting and aggregation of existing, expertly curated imagery for use by data scientists. To address this need, we are building FathomNet, a public platform that makes use of existing (and future), expertly curated data to know what is in the ocean and where it is for effective and responsible marine stewardship. This platform is modeled after popular terrestrial datasets (e.g., ImageNet, COCO) that enabled rapid advances in automated visual analysis. FathomNet seeks to engage a wide audience, from the general publi...

BenthicNet: A global compilation of seafloor images for deep learning applications

arXiv (Cornell University), 2024

Advances in underwater imaging enable the collection of extensive seafloor image datasets that are necessary for monitoring important benthic ecosystems. The ability to collect seafloor imagery has outpaced our capacity to analyze it, hindering expedient mobilization of this crucial environmental information. Recent machine learning approaches provide opportunities to increase the efficiency with which seafloor image datasets are analyzed, yet large and consistent datasets necessary to support development of such approaches are scarce. Here we present BenthicNet: a global compilation of seafloor imagery designed to support the training and evaluation of large-scale image recognition models. An initial set of over 11.4 million images was collected and curated to represent a diversity of seafloor environments using a representative subset of 1.3 million images. These are accompanied by 2.6 million annotations translated to the CATAMI scheme, which span 190,000 of the images. A large deep learning model was trained on this compilation and preliminary results suggest it has utility for automating large and small-scale image analysis tasks. The compilation and model are made openly available for use by the scientific community at https://doi.org/10.20383/103.0614\. from image hosting websites. However, this dataset comprises terrestrial and anthropocentric objects and scenes, and does not represent subaqueous environments. The difference in the domain of the input data may limit the capacity for transfer learning. Development of large-scale models using compilations of benthic imagery that are suitable for transfer learning purposes would be ideal, yet this is made difficult by a lack of universal labels for seabed features. One of the primary difficulties associated with developing deep learning models in this context is that, unlike terrestrial and anthropocentric images, there is no objective label for many seabed habitats, biological communities, substrate types, or organisms. Indeed, a number of different classification schemes are used to label benthic features 33-35. Because no single vocabulary is universally applied to describe these features, we currently lack large sets of consistently labelled images that are necessary for training deep learning models for benthic environments. We note an outstanding need to develop standardized protocols for the translation of common marine image labelling schemes. Self-supervised learning (SSL) is a recent technique in which models can learn to understand their stimuli without the use of manually annotated data 36-44. Instead of using labelled data, self-supervised models learn to solve a pretext task that can be automatically constructed from the input data itself. SSL enables the training of large-scale models on unlabelled imagery, which can be collected at scale more easily than annotated imagery. Models trained with SSL have already learnt to see and understand the stimuli of interest, and can subsequently be used for transfer learning onto specific tasks, even if there is only a limited amount of annotated data available for the new task. SSL may enable the training of deep learning models on large-scale benthic image datasets for the purposes of transfer learning on smaller novel tasks (e.g. site-specific habitat labelling), despite the lack of large consistently labelled image datasets. Cumulatively, adequate volumes of benthic image data currently exist to support the development of SSL models, but they are spread globally among various research groups, government data portals, and open data repositories. There is a need to compile and curate datasets for the development of large-scale image recognition models. Such compilations must include images from a range of biomes, depths, and physical oceanographic conditions in order to adequately represent the global heterogeneity of benthic environments. Additionally, data should be included from an array of acquisition platforms and camera configurations to represent the variability in image characteristics (e.g. lighting, resolution, quality, perspective) that arise from non-standardized image data collection methods. The intended applications and scope for a benthic habitat machine learning image dataset dictate qualities that images should possess to be useful for automating tasks in this context. Unlike imagery that is focused solely on specific biota, benthic habitat images often depict a broader area (e.g. on the order of m 2), which necessarily includes the seafloor. The goal of analyzing such data is often to broadly categorize the benthic environment, potentially including both biotic and abiotic elements 45. Biotic characterization may include descriptions of individual organisms 46 or community composition 47 , while abiotic components include description of substrate, sediment bed forms, heterogeneity, rugosity, and relief 48-50. For these reasons, benthic habitat information is often summarized at the whole-image level-for example, by assigning one or several "habitat" labels to an entire image using a pre-defined scheme 34, 35, 51 , or by aggregating individual labels indicating presence or absence, abundance, or percentage cover of individual habitat components, which may be labelled using a more detailed vocabulary 33. It is therefore useful for benthic habitat images to depict a broad enough area so that both abiotic and biotic habitat components may be recognized. This may differ from other forms of marine image labelling that focus on locating specific objects, semantic labelling, bounding boxes, and masking. These forms of labelling are well suited to applications focusing on single taxa, pelagic biota, and object detection or tracking. Efforts to establish extensive image datasets for those applications are also underway 11, 52-61 , and several data portals and software packages support the labelling and centralization of data to support that work (e.g. CoralNet 11 , FathomNet 53 , SQUIDLE+ 62 , BIIGLE 63 , VIAME). Here we describe BenthicNet: a global compilation of seafloor images that is designed to support development of automated image processing tools for benthic habitat data. With this compilation, we strive to obtain thematic diversity by (i) compiling benthic habitat images from locations around the world, and (ii) representing habitats from a broad range of marine environments. The compiled dataset is assessed for these qualities. Additionally, we aim to achieve diversity of non-thematic image characteristics (e.g. image quality, lighting, perspective) by obtaining data from a range of acquisition platforms and camera configurations. The dataset is presented in three parts: a diverse collection of over 11 million seafloor images from around the world, provided without labels (BenthicNet-11M); a rarefied subset of 1.3 million images, selected to maintain diversity in the imagery while reducing redundancy and volume (BenthicNet-1M); and a collection of 188 688 labelled images bearing 2.6 million annotations (BenthicNet-Labelled). We provide a large SSL model pretrained on BenthicNet-1M, and demonstrate its application using examples from BenthicNet-Labelled. The compilation and SSL model are made openly available to foster further development and assessment of benthic image automation tools. 2 Methods In order to achieve a diverse collection of benthic habitat images for training deep learning models, data spanning a range of environments and geographies were obtained from a variety of sources. These initially included project partners and research 3/140 contacts, which were leveraged to establish additional data partnerships with individuals, academic and not-for-profit research groups, and government organizations. The largest data volumes were eventually obtained from several academic, government, and third-party public data repositories. The acquisition of labelled data was prioritized in all cases, but extensive high quality unlabelled data collections were also included where feasible. The desired format for each dataset was a single folder containing unique images, accompanied by a single comma separated value (CSV) file indicating, at a minimum, the dataset, file name, latitude, longitude, date and time of acquisition, URL (if hosted online) and label(s) (if provided) for each image. 2.1 Data compilation and quality control Labelled benthic image data was initially obtained from project collaborators, data partners, and opportunistic sources such as academic journal supplementary materials. The formats and varieties of data were diverse, including collections of images with spreadsheet metadata, images with metadata contained in file names, GIS files containing images from which metadata was extracted, lists of URL image links, and raw video with text file annotations. Datasets that were not formatted as a single folder of images or list of URL links with CSV metadata were re-formatted upon receipt. Metadata contained in image file names was parsed and used to construct a metadata CSV file where necessary. Image data contained within GIS files was extracted using ArcGIS Pro and the ArcPy Python package, along with geographic information and other metadata contained within the files. All geographic coordinates were converted to decimal degrees using the WGS 84 datum. Data obtained as video files were subsampled by extracting still frames according to their metadata using FFmpeg. After formatting, all datasets were subjected to quality control checks for missing entries, duplicates, label consistency, image quality, and matches between images and metadata. Data columns were renamed to match a standardized format for the BenthicNet dataset. All quality control and formatting was completed using R and Python. The dataset sources are summarized in Table 1. Additional detail on the individual datasets is provided in Appendix C. 2.1.1 Individual contributions A number of datasets...

Video Image Enhancement and Machine Learning Pipeline for Underwater Animal Detection and Classification at Cabled Observatories

Sensors, 2020

An understanding of marine ecosystems and their biodiversity is relevant to sustainable use of the goods and services they offer. Since marine areas host complex ecosystems, it is important to develop spatially widespread monitoring networks capable of providing large amounts of multiparametric information, encompassing both biotic and abiotic variables, and describing the ecological dynamics of the observed species. In this context, imaging devices are valuable tools that complement other biological and oceanographic monitoring devices. Nevertheless, large amounts of images or movies cannot all be manually processed, and autonomous routines for recognizing the relevant content, classification, and tagging are urgently needed. In this work, we propose a pipeline for the analysis of visual data that integrates video/image annotation tools for defining, training, and validation of datasets with video/image enhancement and machine and deep learning approaches. Such a pipeline is requir...

Automating Deep-Sea Video Annotation Using Machine Learning

2020

As the world explores opportunities to develop offshore renewable energy capacity, there will be a growing need for pre-construction biological surveys and post-construction monitoring in the challenging marine environment. Underwater video is a powerful tool to facilitate such surveys, but the interpretation of the imagery is costly and time-consuming. Emerging technologies have improved automated analysis of underwater video, but these technologies are not yet accurate or accessible enough for widespread adoption in the scientific community or industries that might benefit from these tools. To address these challenges, we developed a website that allows us to: (1) Quickly play and annotate underwater videos, (2) Create a short tracking video for each annotation that shows how an annotated concept moves in time, (3) Verify the accuracy of existing annotations and tracking videos, (4) Create a neural network model from existing annotations, and (5) Automatically annotate unwatched v...

Leveraging Metadata in Representation Learning With Georeferenced Seafloor Imagery

IEEE Robotics and Automation Letters, 2021

Camera equipped Autonomous Underwater Vehicles (AUVs) are now routinely used in seafloor surveys. Obtaining effective representations from the images they collect can enable perception-aware robotic exploration such as information-gainguided path planning and target-driven visual navigation. This paper develops a novel self-supervised representation learning method for seafloor images collected by AUVs. The method allows deep-learning convolutional autoencoders to leverage multiple sources of metadata to regularise their learning, prioritising features observed in images that can be correlated with patterns in their metadata. The impact of the proposed regularisation is examined on a dataset consisting of more than 30k colour seafloor images gathered by an AUV off the coast of Tasmania. The metadata used to regularise learning in this dataset consists of the horizontal location and depth of the observed seafloor. The results show that including metadata in self-supervised representation learning can increase image classification accuracy by up to 15% and never degrades learning performance. We show how effective representation learning can be applied to achieve class balanced representative image identification for summarised understanding of imbalanced class distributions in an unsupervised way.

Accelerating Species Recognition and Labelling of Fish From Underwater Video With Machine-Assisted Deep Learning

Frontiers in Marine Science

Machine-assisted object detection and classification of fish species from Baited Remote Underwater Video Station (BRUVS) surveys using deep learning algorithms presents an opportunity for optimising analysis time and rapid reporting of marine ecosystem statuses. Training object detection algorithms for BRUVS analysis presents significant challenges: the model requires training datasets with bounding boxes already applied identifying the location of all fish individuals in a scene, and it requires training datasets identifying species with labels. In both cases, substantial volumes of data are required and this is currently a manual, labour-intensive process, resulting in a paucity of the labelled data currently required for training object detection models for species detection. Here, we present a “machine-assisted” approach for i) a generalised model to automate the application of bounding boxes to any underwater environment containing fish and ii) fish detection and classification...

Context-Driven Detection of Invertebrate Species in Deep-Sea Video

International Journal of Computer Vision

Each year, underwater remotely operated vehicles (ROVs) collect thousands of hours of video of unexplored ocean habitats revealing a plethora of information regarding biodiversity on Earth. However, fully utilizing this information remains a challenge as proper annotations and analysis require trained scientists’ time, which is both limited and costly. To this end, we present a Dataset for Underwater Substrate and Invertebrate Analysis (DUSIA), a benchmark suite and growing large-scale dataset to train, validate, and test methods for temporally localizing four underwater substrates as well as temporally and spatially localizing 59 underwater invertebrate species. DUSIA currently includes over ten hours of footage across 25 videos captured in 1080p at 30 fps by an ROV following pre-planned transects across the ocean floor near the Channel Islands of California. Each video includes annotations indicating the start and end times of substrates across the video in addition to counts of s...

Workflows for Automated Detection and Classification of Unlabeled Deep Sea Imagery

2017

Over the past 28 years, professional video annotators at the Monterey Bay Aquarium Research Institute (MBARI) have recorded over 5.5 million observations throughout a collection of over 23,000 hours of video footage. MBARI researchers and scientists query these observations through the Video Annotation and Reference System (VARS) to conduct oceanographic research. However, recording these observations requires a lot of time, energy, and knowledge from MBARI’s professional video annotators. In addition, due to the ever increasing rate of incoming imagery, an efficient automated detection and classification system would be of great assistance to the upkeep of the VARS database. Because MBARI’s 5.5 million observations are currently unable to be used to train deep learning object class detectors, we explore various workflows to create these systems from unlabeled data. We find that combining deep learning algorithms and various annotation methods in a bootstrapping approach can produce...