BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains (original) (raw)

BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services

F1000Research

Publishing databases in the Resource Description Framework (RDF) model is becoming widely accepted to maximize the syntactic and semantic interoperability of open data in life sciences. Here we report advancements made in the 6th and 7th annual BioHackathons which were held in Tokyo and Miyagi respectively. This review consists of two major sections covering: 1) improvement and utilization of RDF data in various domains of the life sciences and 2) meta-data about these RDF data, the resources that store them, and the service quality of SPARQL Protocol and RDF Query Language (SPARQL) endpoints. The first section describes how we developed RDF data, ontologies and tools in genomics, proteomics, metabolomics, glycomics and by literature text mining. The second section describes how we defined descriptions of datasets, the provenance of data, and quality assessment of services and service discovery. By enhancing the harmonization of these two layers of machine-readable data and knowledg...

Implementation of linked data in the life sciences at BioHackathon 2011

Journal of biomedical semantics, 2015

Linked Data has gained some attention recently in the life sciences as an effective way to provide and share data. As a part of the Semantic Web, data are linked so that a person or machine can explore the web of data. Resource Description Framework (RDF) is the standard means of implementing Linked Data. In the process of generating RDF data, not only are data simply linked to one another, the links themselves are characterized by ontologies, thereby allowing the types of links to be distinguished. Although there is a high labor cost to define an ontology for data providers, the merit lies in the higher level of interoperability with data analysis and visualization software. This increase in interoperability facilitates the multi-faceted retrieval of data, and the appropriate data can be quickly extracted and visualized. Such retrieval is usually performed using the SPARQL (SPARQL Protocol and RDF Query Language) query language, which is used to query RDF data stores. For the datab...

Semantic systems biology: Enabling integrative biology via semantic web technologies

2011

The vast amounts of knowledge in the biomedical domain have paved the way for a new paradigm in biological research called Systems Biology, essentially an approach that relies on the integration of all available knowledge of a biological system in a single model. This approach promotes a comprehensive understanding of biological systems, driven by data integration and mathematical modelling. However, the sheer volume, variation and complexity of the current biological data pose a number of hurdles in knowledge management that need to be overcome. The Semantic Web offers various solutions to these challenges. With our initiative, named Semantic Systems Biology (SSB), we augment the systems biology approach with semantic web technologies to enable smooth data integration, rigorous knowledge representation, efficient querying, and hypothesis generation. Here we present an overview of the projects associated with the SSB initiative. Access to our resources developed within the SSB frame is provided on our website: http://www.semantic-systems-biology.org.

Structuring the life sciences resourceome for Semantic Systems Biology: lessons from the BioGateway project

2008

The application of Semantic Web technologies in the life sciences for data integration is still nascent. We have recently built Bio-Gateway, an RDF store that integrates all the candidate OBO Foundry ontologies with other resources such as SWISS-PROT. In the course of developing BioGateway, we faced challenges that are common to other projects that involve large datasets in diverse formats. We present a detailed analysis of the obstacles that had to be solved in creating Bio-Gateway. In doing so, we demonstrate the potential of a comprehensive application of Semantic Web technologies to global biomedical data. The time is ripe for launching a community effort aiming at a wider acceptance and application of Semantic Web technologies in the life sciences domain. We make a public call for the creation of a forum that strives to implement a truly semantic life science foundation of a type of Systems Biology that we named Semantic Systems Biology.

The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies

2013

Abstract Background BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research.

BioGateway: a semantic systems biology tool for the life sciences

2009

Background: Life scientists need help in coping with the plethora of fast growing and scattered knowledge resources. Ideally, this knowledge should be integrated in a form that allows them to pose complex questions that address the properties of biological systems, independently from the origin of the knowledge. Semantic Web technologies prove to be well suited for knowledge integration, knowledge production (hypothesis formulation), knowledge querying and knowledge maintenance.

A little semantic web goes a long way in biology

2005

We show how state-of-the-art Semantic Web technology can be used in e-Science, in particular, to automate the classification of proteins in biology. We show that the resulting classification was of comparable quality to that performed by a human expert, and how investigations using the classified data even resulted in the discovery of significant information that had previously been overlooked, leading to the identification of a possible drug-target.

SEMANTIC WEB: REVOLUTIONIZING KNOWLEDGE DISCOVERY IN THE LIFE SCIENCES

There are now more than a thousand Web Services offering access to disparate biological resources namely data and computational tools. It is extremely difficult for biological researchers to search in a Web Services (WS) registry for a relevant WS using the standard (primarily computational) descriptions used to describe it. Semantic Biological Web Services Registry (SemBOWSER) is an ontology-based implementation of the UDDI specification, which enables, at present, glycoproteomics researchers to publish, search and discover WS using semantic, service-level, descriptive domain keywords . SemBOWSER classifies a WS along two dimensions--the task they implement and the domain they are associated with. Each published WS is associated with the relevant ProPreO (comprehensive process ontology for glycoproteomics experimental lifecycle) ontology-based keywords (implemented as part of the registry). A researcher, in turn, can search for relevant WS using only the descriptive keywords, part of their everyday working lexicon. This intuitive search is underpinned by the ProPreO ontology, thereby making use of the inherent advantages of a semantic search, as compared to a purely syntactic search, namely disambiguation and use of named relationships between concepts. SemBOWSER is part of the glycoproteomics web portal 'Stargate'.