Modular network construction using eQTL data: an analysis of computational costs and benefits (original) (raw)
Identifying disease candidate genes via large-scale gene network analysis
International Journal of Data Mining and Bioinformatics, 2014
Since gene regulatory networks provide a systematic view of a complex living system, it is important to develop tools which are not only able to build reliable and large-scale gene regulatory networks but also able to identify disease candidate genes using the estimated networks. In this work, we introduce a reverse engineering technique, Bayesian model averaging based networks (BMAnet), which ensembles all appropriate linear models to tackle the uncertainty of model selection and integrates heterogeneous biological datasets. Then various network evaluation measures are used for the comparison of estimated networks and one of the measures called random walk with restart (Rwr) is utilized to search for disease candidate genes.
Discovery of gene network variability across samples representing multiple classes
International Journal of Bioinformatics Research and Applications, 2010
Gene expression profiles from microarray experiments that include samples or biological replicates representing various classes, groups or states (e.g. treatments, developmental stages, health status) have been used to predict gene networks. To further mine the information from samples within and across classes, a framework that integrates Bayesian networks, mixture of gene co-expression models and clustering using on all the genes in the network is proposed. The approach was evaluated on two independent pathways using data from two microarray experiments. The postulated algorithm succeeded on reconstructing the topology of the gene pathways when benchmarked against empirical reports and randomized data sets. The majority or all the samples within a class shared the same co-expression model and were classified within the corresponding class. Our approach uncovered both, gene relationships and profiles that are unique to a particular class or shared across classes.
Searching for limited connectivity in genetic network models
2001
The inference of regulatory interactions between genes from time-course micro-array data is one of the most challenging tasks in the field of functional genomics. The multitude of genes that can now be measured using micro-array technology requires analysis tools that can easily scale-up with respect to the number of genes. This scalability is especially important when inferring genetic interactions, because this task is complicated by the combinatorial nature of gene interaction and because the high cost of micro-array measurements still severely limits the number of measured timepoints. Because of this limitation of the data, it is essential to incorporate as much additional information as possible. This can be achieved by applying constraints based on general biological knowledge and by including specific knowledge about known interactions. In this paper we employ the fact that genetic networks are believed to exhibit limited connectivity. We propose a general approach in which we separate the task of finding the structure of the networks from the task of finding the best parameters of the model, given the structure. The second task can be solved efficiently for most models, but the first task amounts to a search problem which requires the choice of a suitable evaluation function and search strategy. Experimental investigations determined that the best evaluation function is simply the mean squared error on the training data. Through further extensive experimental investigation of several search strategies, it was found that the best search strategy is based on an approach of greedily increasing the number of connections. The strength of the proposed approach lies in the fact that it can be employed to all genetic network models and allows genetic network models to scale up to a large number of genes.
How to infer gene networks from expression profiles
Molecular Systems Biology, 2007
Inferring, or 'reverse-engineering', gene networks can be defined as the process of identifying gene interactions from experimental data through computational analysis. Gene expression data from microarrays are typically used for this purpose. Here we compared different reverseengineering algorithms for which ready-to-use software was available and that had been tested on experimental data sets. We show that reverse-engineering algorithms are indeed able to correctly infer regulatory interactions among genes, at least when one performs perturbation experiments complying with the algorithm requirements. These algorithms are superior to classic clustering algorithms for the purpose of finding regulatory interactions among genes, and, although further improvements are needed, have reached a discreet performance for being practically useful.
Inferring Gene Networks: Dream or Nightmare?
Annals of the New York Academy of Sciences, 2009
We describe several algorithms with winning performance in the Dialogue for Reverse Engineering Assessments and Methods (DREAM2) Reverse Engineering Competition 2007. After the gold standards for the challenges were released, the performance of the algorithms could be thoroughly evaluated under different parameters or alternative ways of solving systems of equations. For the analysis of Challenge 4, the "In-silico" challenges, we employed methods to explicitly deal with perturbation data and timeseries data. We show that original methods used to produce winning submissions could easily be altered to substantially improve performance. For Challenge 5, the genomescale Escherichia coli network, we evaluated a variety of measures of association. These data are troublesome, and no good solutions could be produced, either by us or by any other teams. Our best results were obtained when analyzing subdatasets instead of considering the dataset as a whole.
Nucleic Acids Research, 2012
Genomic experiments (e.g. differential gene expression, single-nucleotide polymorphism association) typically produce ranked list of genes. We present a simple but powerful approach which uses proteinprotein interaction data to detect sub-networks within such ranked lists of genes or proteins. We performed an exhaustive study of network parameters that allowed us concluding that the average number of components and the average number of nodes per component are the parameters that best discriminate between real and random networks. A novel aspect that increases the efficiency of this strategy in finding sub-networks is that, in addition to direct connections, also connections mediated by intermediate nodes are considered to build up the sub-networks. The possibility of using of such intermediate nodes makes this approach more robust to noise. It also overcomes some limitations intrinsic to experimental designs based on differential expression, in which some nodes are invariant across conditions. The proposed approach can also be used for candidate disease-gene prioritization. Here, we demonstrate the usefulness of the approach by means of several case examples that include a differential expression analysis in Fanconi Anemia, a genome-wide association study of bipolar disorder and a genome-scale study of essentiality in cancer genes. An efficient and easy-to-use web interface (available at http://www.babelomics.org) based on HTML5 technologies is also provided to run the algorithm and represent the network.
Network Modeling and Analysis of Normal and Cancer Gene Expression Data
Computational Intelligence Methods for Bioinformatics and Biostatistics, 2020
Network modelling is an important approach to understand cell behaviour. It has proven its effectiveness in understanding biological processes and finding novel biomarkers for severe diseases. In this study, using gene expression data and complex network techniques, we propose a computational framework for inferring relationships between RNA molecules. We focus on gene expression data of kidney renal clear cell carcinoma (KIRC) from the TCGA project, and we build RNA relationship networks for either normal or cancer condition using three different similarity measures (Pearson's correlation, Euclidean distance and inverse Covariance matrix). We analyze the networks individually and in comparison to each other, highlighting their differences. The analysis identified known cancer genes/miRNAs and other RNAs with interesting features in the networks, which may play an important role in kidney renal clear cell carcinoma.
Biomedical Genetics and Genomics
Cancer is a group of diseases that involves abnormal cell growth, resulting from genetic perturbations in signaling mechanisms. High resolution RNAseq and microarray assays enable the evaluation of the transcriptional activity of high number of signaling molecules. Furthermore, many signaling pathways are described in publically available databases. Today's challenge lies in the connection of signaling pathways and signaling data to produce predictive models which have the power to validate and identify targets in disease treatment. Curating networks manually can be exhaustive handiwork. We designed an ensemble approach of gene set enrichment on seven pathway databases. It generates a basic gene set mapping of the complex input data on comprehensive pathways. Using two publically available protein-protein interaction databases, the novel algorithm automatically reconstructs a comprehensive biological system representation from these mappings. The reconstruction was based on a newly shortest path algorithm. Using a microarray data set from hepatocellular cancer cells as input, a network with well-known cancer signaling mechanisms was derived. Furthermore, nodes accounting for hormone signaling were found as being modified in liver cancer that can be used as future research targets. Two recent publically available networks were adequately inferred when testing the method to reconstruct manually curated signaling networks. Finally, our method shows that integration of raw data and publically available knowledge expeditiously generates convenient and analyzable network views.
Bioinformatics, 2007
Motivation: Gene expression profiling is an important tool for gaining insight into biology. Novel strategies are required to analyze the growing archives of microarray data and extract useful information from them. One area of interest is in the construction of gene association networks from collections of profiling data. Various approaches have been proposed to construct gene networks using profiling data, and these networks have been used in functional inference as well as in data visualization. Here, we investigated a non-parametric approach to translate profiling data into a gene network. We explored the characteristics and utility of the resulting network and investigated the use of network information in analysis of variance models and hypothesis testing. Results: Our work is composed of two parts: gene network construction and partitioning and hypothesis testing using subnetworks as groups. In the first part, multiple independently collected microarray datasets from the Gene Expression Omnibus data repository were analyzed to identify probe pairs that are positively co-regulated across the samples. A co-expression network was constructed based on a reciprocal ranking criteria and a false discovery rate analysis. We named this network Reference Gene Association (RGA) network. Then, the network was partitioned into densely connected sub-networks of probes using a multilevel graph partitioning algorithm. In the second part, we proposed a new, MANOVA-based approach that can take individual probe expression values as input and perform hypothesis testing at the sub-network level. We applied this MANOVA methodology to two published studies and our analysis indicated that the methodology is both effective and sensitive for identifying transcriptional sub-networks or pathways that are perturbed across treatments.
Gene expression complex networks: synthesis, identification and analysis
Journal of Computational Biology, 2011
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the smallworld Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree hki variation, decreasing its network recovery rate with the increase of hki. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In 1353 summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods. 1362 LOPES ET AL.