Modular network construction using eQTL data: an analysis of computational costs and benefits (original) (raw)
Related papers
Gene Network Inference via Structural Equation Modeling in Genetical Genomics Experiments
Genetics, 2008
Our goal is gene network inference in Genetical Genomics or Systems Genetics experiments. For species where sequence information is available, we first perform expression QTL mapping by jointly utilizing cis, cistrans and trans regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTLs, and directed edges from eQTLs to cis-regulated target genes, from cisregulated genes to cistrans regulated target genes, from trans-regulator genes to target genes and from trans-eQTLs to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose Structural Equation Modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. Based on a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's Window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.
Identification of Genetic Networks
Genetics, 2004
In this report, we propose the use of structural equations as a tool for identifying and modeling genetic networks and genetic algorithms for searching the most likely genetic networks that best fit the data. After genetic networks are identified, it is fundamental to identify those networks influencing cell phenotypes. To accomplish this task we extend the concept of differential expression of the genes, widely used in gene expression data analysis, to genetic networks. We propose a definition for the differential expression of a genetic network and use the generalized T 2 statistic to measure the ability of genetic networks to distinguish different phenotypes. However, describing the differential expression of genetic networks is not enough for understanding biological systems because differences in the expression of genetic networks do not directly reflect regulatory strength between gene activities. Therefore, in this report we also introduce the concept of differentially regulated genetic networks, which has the potential to assess changes of gene regulation in response to perturbation in the environment and may provide new insights into the mechanism of diseases and biological processes. We propose five novel statistics to measure the differences in regulation of genetic networks. To illustrate the concepts and methods for reconstruction of genetic networks and identification of association of genetic networks with function, we applied the proposed models and algorithms to three data sets.
Pacific Symposium on Biocomputing, 2006
We propose a computational strategy for discovering gene networks affected by a chemical compound. Two kinds of DNA microarray data are assumed to be used: One dataset is short time-course data that measure responses of genes following an experimental treatment. The other dataset is obtained by several hundred single gene knock-downs. These two datasets provide three kinds of information; (i) A gene network is estimated from time-course data by the dynamic Bayesian network model, (ii) Relationships between the knocked-down genes and their regulatees are estimated directly from knock-down microarrays and (iii) A gene network can be estimated by gene knock-down data alone using the Bayesian network model. We propose a method that combines these three kinds of information to provide an accurate gene network that most strongly relates to the mode-of-action of the chemical compound in cells. This information plays an essential role in pharmacogenomics. We illustrate this method with an actual example where human endothelial cell gene networks were generated from a novel time course of gene expression following treatment with the drug fenofibrate, and from 270 novel gene knock-downs. Finally, we succeeded in inferring the gene network related to PPAR-α, which is a known target of fenofibrate.
A two-stage approach of gene network analysis for high-dimensional heterogeneous data
Biostatistics, 2017
Gaussian graphical models have been widely used to construct gene regulatory networks from gene expression data. Most existing methods for Gaussian graphical models are designed to model homogeneous data, assuming a single Gaussian distribution. In practice, however, data may consist of gene expression studies with unknown confounding factors, such as study cohort, microarray platforms, experimental batches, which produce heterogeneous data, and hence lead to false positive edges or low detection power in resulting network, due to those unknown factors. To overcome this problem and improve the performance in constructing gene networks, we propose a two-stage approach to construct a gene network from heterogeneous data. The first stage is to perform a clustering analysis in order to assign samples to a few clusters where the samples in each cluster are approximately homogeneous, and the second stage is to conduct an integrative analysis of networks from each cluster. In particular, we first apply a model-based clustering method using the singular value decomposition for high-dimensional data, and then integrate the networks from each cluster using the integrative ψ-learning method. The proposed method is based on an equivalent measure of partial correlation coefficients in Gaussian graphical models, which is computed with a reduced conditional set and thus it is useful for high-dimensional data. We compare the proposed two-stage learning approach with some existing methods in various simulation settings, and demonstrate the robustness of the proposed method. Finally, it is applied to integrate multiple gene expression studies of lung adenocarcinoma to identify potential therapeutic targets and treatment biomarkers.
Gene regulatory networks are graph models representing cellular transcription events. Networks are far from complete due to time and resource consumption for experimental validation and curation of the interactions. Previous assessments have shown the modest performance of the available network inference methods based on gene expression data. Here, we study several caveats on the inference of regulatory networks and methods assessment through the quality of the input data and gold standard, and the assessment approach with a focus on the global structure of the network. We used synthetic and biological data for the predictions and experimentally-validated biological networks as the gold standard (ground truth). Standard performance metrics and graph structural properties suggest that methods inferring co-expression networks should no longer be assessed equally with those inferring regulatory interactions. While methods inferring regulatory interactions perform better in global regul...
Identifying disease candidate genes via large-scale gene network analysis
International Journal of Data Mining and Bioinformatics, 2014
Since gene regulatory networks provide a systematic view of a complex living system, it is important to develop tools which are not only able to build reliable and large-scale gene regulatory networks but also able to identify disease candidate genes using the estimated networks. In this work, we introduce a reverse engineering technique, Bayesian model averaging based networks (BMAnet), which ensembles all appropriate linear models to tackle the uncertainty of model selection and integrates heterogeneous biological datasets. Then various network evaluation measures are used for the comparison of estimated networks and one of the measures called random walk with restart (Rwr) is utilized to search for disease candidate genes.
Discovery of gene network variability across samples representing multiple classes
International Journal of Bioinformatics Research and Applications, 2010
Gene expression profiles from microarray experiments that include samples or biological replicates representing various classes, groups or states (e.g. treatments, developmental stages, health status) have been used to predict gene networks. To further mine the information from samples within and across classes, a framework that integrates Bayesian networks, mixture of gene co-expression models and clustering using on all the genes in the network is proposed. The approach was evaluated on two independent pathways using data from two microarray experiments. The postulated algorithm succeeded on reconstructing the topology of the gene pathways when benchmarked against empirical reports and randomized data sets. The majority or all the samples within a class shared the same co-expression model and were classified within the corresponding class. Our approach uncovered both, gene relationships and profiles that are unique to a particular class or shared across classes.
Searching for limited connectivity in genetic network models
2001
The inference of regulatory interactions between genes from time-course micro-array data is one of the most challenging tasks in the field of functional genomics. The multitude of genes that can now be measured using micro-array technology requires analysis tools that can easily scale-up with respect to the number of genes. This scalability is especially important when inferring genetic interactions, because this task is complicated by the combinatorial nature of gene interaction and because the high cost of micro-array measurements still severely limits the number of measured timepoints. Because of this limitation of the data, it is essential to incorporate as much additional information as possible. This can be achieved by applying constraints based on general biological knowledge and by including specific knowledge about known interactions. In this paper we employ the fact that genetic networks are believed to exhibit limited connectivity. We propose a general approach in which we separate the task of finding the structure of the networks from the task of finding the best parameters of the model, given the structure. The second task can be solved efficiently for most models, but the first task amounts to a search problem which requires the choice of a suitable evaluation function and search strategy. Experimental investigations determined that the best evaluation function is simply the mean squared error on the training data. Through further extensive experimental investigation of several search strategies, it was found that the best search strategy is based on an approach of greedily increasing the number of connections. The strength of the proposed approach lies in the fact that it can be employed to all genetic network models and allows genetic network models to scale up to a large number of genes.
How to infer gene networks from expression profiles
Molecular Systems Biology, 2007
Inferring, or 'reverse-engineering', gene networks can be defined as the process of identifying gene interactions from experimental data through computational analysis. Gene expression data from microarrays are typically used for this purpose. Here we compared different reverseengineering algorithms for which ready-to-use software was available and that had been tested on experimental data sets. We show that reverse-engineering algorithms are indeed able to correctly infer regulatory interactions among genes, at least when one performs perturbation experiments complying with the algorithm requirements. These algorithms are superior to classic clustering algorithms for the purpose of finding regulatory interactions among genes, and, although further improvements are needed, have reached a discreet performance for being practically useful.
Inferring Gene Networks: Dream or Nightmare?
Annals of the New York Academy of Sciences, 2009
We describe several algorithms with winning performance in the Dialogue for Reverse Engineering Assessments and Methods (DREAM2) Reverse Engineering Competition 2007. After the gold standards for the challenges were released, the performance of the algorithms could be thoroughly evaluated under different parameters or alternative ways of solving systems of equations. For the analysis of Challenge 4, the "In-silico" challenges, we employed methods to explicitly deal with perturbation data and timeseries data. We show that original methods used to produce winning submissions could easily be altered to substantially improve performance. For Challenge 5, the genomescale Escherichia coli network, we evaluated a variety of measures of association. These data are troublesome, and no good solutions could be produced, either by us or by any other teams. Our best results were obtained when analyzing subdatasets instead of considering the dataset as a whole.