Mediation of heterogeneous information resources in the gene expression regulation domain (original) (raw)
Motivation: Semantic integration of heterogeneous information and procedural sources in bioinformatics is a necessary pre-requisite for efficient research. Results: Application of the subject mediation approach in bioinformatics is shown for the gene expression regulation domain. Availability: The results are available on request from the authors. Introduction A discriminative feature of molecular-genetic systems is their complex hierarchical and/or network organization. For instance, an organ consists of tissues, a tissue -of various cell types, a cell -out of compartments (i.e., cytoplasm, nucleus, vacuoles, etc.) that contain the macromolecules of DNA, RNA, and proteins. These macromolecules intensively interact with each other (they organize complexes, act in various reactions, move through cell compartments, cells, tissues, and organs, etc.), thus forming a composite net of interactions, namely, the gene network. While solving concrete problems that are important in practice it is necessary to use a large number of heterogeneous, weakly structured molecular-genetical databases accumulating the results of numerous, complementary, intersecting, and probably contradictory experimental data. Databases on molecular-genetic information store the sequences, structures, 3D descriptions, attributive information, along with program software tools for data analysis, search of regularities, and prediction of different properties of objects, data reorganization, visualization, etc. For efficient organization of research in the domain of bioinformatics it is required to organize properly the relevant information in specific research areas. One of the important outcomes of such organization would be provision of access to and querying of a large number of distributed information sources including various data on the primary and spatial structure of DNA and RNA macromolecules, proteins and their complexes as well as data on peculiarities of their interactions with each other. Such data usually are semistructured. For their processing, a significant amount of additional metainformation, complex semantic analysis combining various methods may be required. The problem becomes even more complicated because data stored in different sources are obtained for different research entities, with different precision describing real processes in the living organism. To provide for semantic integration of nonsystematic population of autonomous information sources kept by different information providers into a well-structured information collection it is required to create the global unified representation of the existing information sources and services. To reach that it is proposed to form a special middleware consisting of the subject mediators. For each subject mediator, the