A review of the heterogeneous landscape of biodiversity databases: opportunities and challenges for a synthesized biodiversity knowledge base (original) (raw)
Related papers
Data integration enables global biodiversity synthesis
Proceedings of the National Academy of Sciences, 2021
Significance As anthropogenic impacts to Earth systems accelerate, biodiversity knowledge integration is urgently required to support responses to underpin a sustainable future. Consolidating information from disparate sources (e.g., community science programs, museums) and data types (e.g., environmental, biological) can connect the biological sciences across taxonomic, disciplinary, geographical, and socioeconomic boundaries. In an analysis of the research uses of the world’s largest cross-taxon biodiversity data network, we report the emerging roles of open-access data aggregation in the development of increasingly diverse, global research. These results indicate a new biodiversity science landscape centered on big data integration, informing ongoing initiatives and the strategic prioritization of biodiversity data aggregation across diverse knowledge domains, including environmental sciences and policy, evolutionary biology, conservation, and human health.
Connecting data and expertise: a new alliance for biodiversity knowledge
Biodiversity Data Journal
There has been major progress over the last two decades in digitising historical knowledge of biodiversity and in making biodiversity data freely and openly accessible. Interlocking efforts bring together international partnerships and networks, national, regional and institutional projects and investments and countless individual contributors, spanning diverse biological and environmental research domains, government agencies and non-governmental organisations, citizen science and commercial enterprise. However, current efforts remain inefficient and inadequate to address the global need for accurate data on the world's species and on changing patterns and trends in biodiversity. Significant challenges include imbalances in regional engagement in biodiversity informatics activity, uneven progress in data mobilisation and sharing, the lack of stable persistent identifiers for data records, redundant and incompatible processes for cleaning and interpreting data and the absence of...
Research applications of primary biodiversity databases in the digital age
PloS ONE, 2019
Our world is in the midst of unprecedented change-climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollec-tions community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing environment. PLOS ONE | https://doi.org/10.1371/journal.pone.
The Biodiversity Informatics Landscape: Elements, Connections and Opportunities
Research Ideas and Outcomes
There are a multitude of biodiversity informatics projects, datasets, databases and initiatives at the global level, and many more at regional, national, and sometimes local levels. In such a complex landscape, it can be unclear how different elements relate to each other. Based on a high-level review of global and European-level elements, we present a map of the biodiversity informatics landscape. This is a first attempt at identifying key datasets/databases and data services, and mapping them in a way that can be used to identify the links, gaps and redundancies in the landscape. While the map is predominantly focused on elements with a global scope, the sub-global focus at the European-level was incorporated in the map in order to demonstrate how a regional network such as the European Biodiversity Observation Network (EU BON) can usefully contribute to connecting some of the nodes within the landscape. We identify 74 elements, and find that the informatics landscape is complex in terms of the characteristics and diversity of these elements, and that there is high variability in their level of connectedness. Overall, the landscape is highly connected, with one element boasting 28 connections. The average "degrees of separation" between elements is low, and the landscape is deemed relatively robust to failures since there is no single point that information flows through. Examples of possible effort duplication are presented, and the inclusion of five policy-level elements in the map helps illustrate how informatics products can contribute to global processes that define and direct political targets. Beyond simply describing the existing landscape, this map will support a better understanding of the landscape's current structure and functioning, enabling responsible institutions to establish or strengthen collaborations, work towards avoiding effort duplication, and facilitate access to the biodiversity data, information and knowledge required to support effective decision-making, in the context of comparatively limited funding for biodiversity knowledge and conservation. To support this, we provide the input matrix and code that created this map as supplementary materials, so that readers can more closely examine the links in the landscape, and edit the map to suit their own purposes.
Bridging the biodiversity data gaps: Recommendations to meet users’ data needs
Biodiversity Informatics, 2013
Freely available high quality, data on species occurrence and associated variables are needed in order to track changes in biodiversity. One of the main issues surrounding the provision of such data is that sources vary in quality, scope, and accuracy. Publishers of such data must face the challenge of maximizing quality, utility and breadth of data coverage, in order to make such data useful to users. With the Global Biodiversity Information Facility (GBIF), we recently conducted a content needs assessment survey to consolidate and synthesize major user needs regarding biodiversity data. We find a broad range of recommendations from the survey respondents, principally concerning issues such as data quality, bias, and coverage, and ease of access. We recommend a candidate set of actions for the GBIF that fall into three classes: 1) addressing data gaps, data volume, and data quality, 2) aggregating data types that are relatively new to GBIF, to support emerging new applications, and 3) promoting ease-of-use and providing incentives for wider use. Addressing the challenge of providing high quality primary biodiversity data potentially can serve the needs of national and international biodiversity initiatives. These include the "flexible framework" for addressing the new 2020 biodiversity targets of the Convention on Biological Diversity, the global biodiversity observation network (GEO BON) and the new Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES). Each of these presents opportunities for countries to define appropriate actions and corresponding data needs, with links from local to global scales.
FinBIF: An all-embracing, integrated, cross-sectoral biodiversity data infrastructure
Biodiversity Information Science and Standards
The service model of the Global Biodiversity Information Facility (GBIF) is being implemented in an increasing number of national biodiversity (BD) data services. While GBIF already shares >109 data points, national initiatives are an essential component: increase in GBIF-mediated data relies on national data mobilisation and GBIF is not optimised to support local use. The Finnish Biodiversity Information Facility (FinBIF), initiated in 2012 and operational since late 2016, is one of the more recent examples of national BD research infrastructures (RIs) – and arguably among the most comprehensive. Here, we describe FinBIF’s development and service integration, and provide a model approach for the construction of all-inclusive national BD RIs. FinBIF integrates a wide array of BD RI approaches under the same umbrella. These include large-scale and multi-technology digitisation of natural history collections; building a national DNA barcode reference library and linking it to speci...
The Global Registry of Biodiversity Repositories: A Call for Community Curation
Biodiversity Data Journal, 2016
The Global Registry of Biodiversity Repositories is an online metadata resource for biodiversity collections, the institutions that contain them, and associated staff members. The registry provides contact and address information, characteristics of the institutions and collections using controlled vocabularies and free-text descripitons, links to related websites, unique identifiers for each institution and collection record, text fields for loan and use policies, and a variety of other descriptors. Each institution record includes an institutionCode that must be unique, and each collection record must have a collectionCode that is unique within that institution. The registry is populated with records imported from the largest similar registries and more can be harmonized and added. Doing so will require community input and curation and would produce a truly comprehensive and unifying information resource.
A survey of biodiversity informatics: Concepts, practices, and challenges
WIREs Data Mining and Knowledge Discovery, 2020
The unprecedented size of the human population, along with its associated economic activities, have an ever increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide them. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision-makers in ways that they can effectively use them. The development and deployment of mechanisms to produce these indicators depend on having access to trustworthy data from field surveys and automated sensors, biological collections, molecular data, and historic academic literature. The transformation of this raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques used to manage and analyze this data comprise an area often called biodiversity informatics (or e-Biodiversity). Biodiversity data follows a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.