An integration platform for heterogeneous bioinformatics software components (original) (raw)

ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources

2001

Abstract Motivation: Heterogeneity of databases and software resources continues to hamper the integration of biological information. Top-down solutions are not feasible for the full-scale problem of integration across biological species and data types. Bottom-up solutions so far have not integrated, in a maximally flexible way, dynamic and interactive graphical-user-interface components with data repositories and analysis tools.

Architecture for interoperable software in biology

Understanding biological complexity demands a combination of high-throughput data and interdisciplinary skills. One way to bring to bear the necessary combination of data types and expertise is by encapsulating domain know- ledge in software and composing that software to create a customized data analysis environment. To this end, simple flexible strategies are needed for interconnecting heterogeneous software tools and enabling data exchange between them. Drawing on our own work and that of others, we present several strategies for interoperability and their consequences, in particular, a set of simple data structures - list, matrix, network, table and tuple - that have proven sufficient to achieve a high degree of interoperability. We provide a few guidelines for the development of future software that will function as part of an interoperable community of software tools for biological data analysis and visualization.

A framework for molecular biology data integration

Procs. Workshop on Information …, 2001

Molecular biology data are placed in different databases, repositories and flat files, usually distributed over the web. Distinct data models with schemas that are often changing implement these heterogeneous data sources. It is very important to gather information about these data sources, including schemas and ontology. The usual approach to handle this information integration problem is to use a single model that captures all the needed data and related methods. Instead, this work proposes the use of a domain specific framework for molecular biology data access and applications. This way we can capture multiple schemas and preexisting data sources, besides having a tool for schema evolution maintenance and database instantiation. 1.

Data Integration in Bioinformatics: Current Efforts and Challenges

intechopen.com

With the rapid advancements in next-generation sequencing (NGS) technologies and the consequently fast-growing volume of biological data, a diversity of data sources (databases and web servers) have been created to facilitate data management, accessibility, and analysis. A prerequisite of bioinformatics research has been the ability to find, maneuver and access data deposited in various data sources. For a given bioinformatic task, researchers often need to be skillful in interrogating these data sources, and in the use of extracted information for further data analysis/information search.

Database integration and querying in the bioinformatics domain

2005

Given the exponential growth in the amount of genetic data being produced, it is more important than ever for researchers to have effective tools to help them manage this data. This paper describes a system that enables users, generally biologists, to construct components to answer specific questions in their field. The system allows the creation of modules and submodules via top-down decomposition. Concepts and terms can be defined through conversation. These are then used when composing base-level functions to produce code for modules and for interfacing modules.

GeNS: a biological data integration platform

Proceedings of the …, 2009

The scientific achievements coming from molecular biology depend greatly on the capability of computational applications to analyze the laboratorial results. A comprehensive analysis of an experiment requires typically the simultaneous study of the obtained dataset with data that is available in several distinct public databases. Nevertheless, developing a centralized access to these distributed databases rises up a set of challenges such as: what is the best integration strategy, how to solve nomenclature clashes, how to solve database overlapping data and how to deal with huge datasets. In this paper we present GeNS, a system that uses a simple and yet innovative approach to address several biological data integration issues. Compared with existing systems, the main advantages of GeNS are related to its maintenance simplicity and to its coverage and scalability, in terms of number of supported databases and data types. To support our claims we present the current use of GeNS in two concrete applications. GeNS currently contains more than 140 million of biological relations and it can be publicly downloaded or remotely access through SOAP web services.

MOWSERV 2: Friendly and extensible web platform for bioinformatics tools integration

In recent years the growth in diversity and heterogeneity of Bioinformatics tools is a fact. The variety of these tools is one of the keys of data analysis flexibility. However, it also carries some disadvantages such as un-standardized service interfaces, different data formats for inputs and outputs, etc. As a direct result, some of the applications that integrate those tools have become outdated for todays needs. We present a new integration platform that solves all those issues providing a flexible and friendly environment.