Knowledge Discovery for Biodiversity : from Data Mining to Sign Management (original) (raw)

A survey of biodiversity informatics: Concepts, practices, and challenges

WIREs Data Mining and Knowledge Discovery, 2020

The unprecedented size of the human population, along with its associated economic activities, have an ever increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide them. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision-makers in ways that they can effectively use them. The development and deployment of mechanisms to produce these indicators depend on having access to trustworthy data from field surveys and automated sensors, biological collections, molecular data, and historic academic literature. The transformation of this raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques used to manage and analyze this data comprise an area often called biodiversity informatics (or e-Biodiversity). Biodiversity data follows a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.

A decadal view of biodiversity informatics: challenges and priorities

BMC ecology, 2013

Biodiversity informatics plays a central enabling role in the research community's efforts to address scientific conservation and sustainability issues. Great strides have been made in the past decade establishing a framework for sharing data, where taxonomy and systematics has been perceived as the most prominent discipline involved. To some extent this is inevitable, given the use of species names as the pivot around which information is organised. To address the urgent questions around conservation, land-use, environmental change, sustainability, food security and ecosystem services that are facing Governments worldwide, we need to understand how the ecosystem works. So, we need a systems approach to understanding biodiversity that moves significantly beyond taxonomy and species observations. Such an approach needs to look at the whole system to address species interactions, both with their environment and with other species. It is clear that some barriers to progress are sociological, basically persuading people to use the technological solutions that are already available. This is best addressed by developing more effective systems that deliver immediate benefit to the user, hiding the majority of the technology behind simple user interfaces. An infrastructure should be a space in which activities take place and, as such, should be effectively invisible. This community consultation paper positions the role of biodiversity informatics, for the next decade, presenting the actions needed to link the various biodiversity infrastructures invisibly and to facilitate understanding that can support both business and policy-makers. The community considers the goal in biodiversity informatics to be full integration of the biodiversity research community, including citizens' science, through a commonly-shared, sustainable e-infrastructure across all sub-disciplines that reliably serves science and society alike.

A Survey of e-Biodiversity: Concepts, Practices, and Challenges

ArXiv, 2018

The unprecedented size of the human population, along with its associated economic activities, have an ever increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide them. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision-makers in ways that they can effectively use them. The development and deployment of mechanisms to produce these indicators depend on having access to trustworthy data from field surveys and automated sensors, biological collections, molecular data, and historic academic literature. The transformation of this raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques used to manage and analyze this data comprise an area often called biodiversity informatics (or e-Biodiversity). Biodiversity data follows a life cycle co...

Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets

Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk, 2017

Extisting biodiversity databases contain an abundance of information. To turn such information into knowledge, it is necessary to address several information-model issues. Biodiversity data are collected for various scientific objectives, often even without clear preliminary objectives, may follow different taxonomy standards and organization logic, and be held in multiple file formats and utilising a variety of database technologies. This paper presents a graph catalogue model for the metadata management of biodiversity databases. It explores the possible operation of data mining and visualization to guide the analysis of heterogeneous biodiversity data. In particular, we would propose contributions to the problems of (1) the analysis of heterogeneous distributed data found across different databases, (2) the identification of matches and approximations between data sets, and (3) the identificaton of relationships between various databases. This paper describes a proof of concept of an infrastructure testbed and its basic operations, presenting an evaluation of the resulting system in comparison with the ideal expectations of the ecologist.

The big questions for biodiversity informatics

Systematics and Biodiversity, 2010

Science is a sequence of generating new ideas, detailed explorations, incorporation of the results into a toolbox for understanding data, and turning them into useful knowledge. One recent development has been large-scale, computer-aided management of biodiversity information. This emerging field of biodiversity informatics has been growing quickly, but without overarching scientific questions to guide its development; the result has been developments that have no connection to genuine insight and forward progress. We outline what biodiversity informatics should be, a link between diverse dimensions of organismal biology -genomics, phylogenetics, taxonomy, distributional biology, ecology, interactions, and conservation status -and describe the science progress that would result. These steps will enable a transition from 'gee-whiz' to fundamental science infrastructure.

Process of Ontology Construction for the Development of an Intelligent System for the Organization and Retrieval of Knowledge in Biodiversity — SISBIO

IFIP International Federation for Information Processing, 2006

This work describes the ontology construction process for the development of an Intelligent System for the Organization and Retrieval of Knowledge in Biodiversity -SISBIO. The system aims at the production of strategic information for the biofiiel chain Two main methodologies are used for the construction of the ontologies; knowledge engineering and ontology engineering. The first one consists of extracting and organizing the bio&el specialists' knowledge, and ontology engineering is used to represent the knowledge through indicative expressions and its relations, developing a semantic network of relationships.

A knowledge environment for the biodiversity and ecological sciences

2007

The Science Environment for Ecological Knowledge (SEEK) is a knowledge environment that is being developed to address many of the current challenges associated with data accessibility and integration in the biodiversity and ecological sciences. The SEEK information technology infrastructure encompasses three integrated systems: (1) EcoGrid-an open architecture for data access; (2) a Semantic Mediation System based on domain-specific ontologies; and (3) an Analysis and Modeling System that supports semantically integrated analytical workflows. Multidisciplinary scientists and programmers from multiple institutions comprise the core development team. SEEK design and development are informed by three multidisciplinary teams of scientists organized in Working Groups. The Biodiversity and Ecological Analysis and Modeling Working Group informs development through evaluation of SEEK efficacy in addressing biodiversity and ecological questions. The Knowledge Representation Working Group provides knowledge representation requirements from the domain sciences and develops the corresponding knowledge representations (ontologies) to support the assembly of analytical workflows in the Analysis and Modeling System, and the intelligent data and service discovery in the EcoGrid. A Biological Classification and Nomenclature Working Group investigates solutions to mediating among multiple taxonomies for naming organisms. A multifaceted education, outreach and training program ensures that the SEEK research products, software, and information technology infrastructure optimally benefit the target communities.