Advances to Bayesian network inference for generating causal networks from observational biological data (original) (raw)
Related papers
Current Bioinformatics, 2014
In the post-genome era, designing and conducting novel experiments have become increasingly common for modern researchers. However, the major challenge faced by researchers is surprisingly not the complexity in designing new experiments or obtaining the data generated from the experiments, but instead it is the huge amount of data to be processed and analyzed in the quest to produce meaningful information and knowledge. Gene regulatory network (GRN) inference from gene expression data is one of the common examples of such challenge. Over the years, GRN inference has witnessed a number of transitions, and an increasing amount of new computational and statistical-based methods have been applied to automate the procedure. One of the widely used approaches for GRN inference is the dynamic Bayesian network (DBN). In this review paper, we first discuss the evolution of molecular biology research from reductionism to holism. This is followed by a brief insight on various computational and statistical methods used in GRN inference before focusing on reviewing the current development and applications of DBN-based methods. Chai et al. Category Inference Model Logical models Boolean networks Probabilistic Boolean networks [30, 31] Bayesian networks Continuous models Continuous linear models [32] Dynamic Bayesian networks Ordinary differential equations Regulated flux balance analysis [33] Single-molecule level Stochastic simulation algorithm [34] Inferring Gene Regulatory Networks
Using Bayesian network inference algorithms to recover molecular genetic regulatory networks
… Conference on Systems …, 2002
Recent advances in high-throughput molecular biology has motivated in the field of bioinformatics the use of network inference algorithms to predict causal models of molecular networks from correlational data. However, it is extremely difficult to evaluate the effectiveness of these algorithms because we possess neither the knowledge of the correct biological networks nor the ability to experimentally validate the hundreds of predicted gene interactions within a reasonable amount of time. Here, we apply a new approach developed by Smith, et al. (2002) that tests the ability of network inference algorithms to accurately and efficiently recover network structures based on gene expression data taken from a simulated biological pathway in which the structure is known a priori. We simulated a genetic regulatory network and used the resultant sampled data to test variations in the design of a Bayesian Network inference algorithm, as well as variations in total quantity of available data, length of sampling interval, method of data discretization, and presence of interpolated data between observed data points. We also advanced the inference algorithm by developing a heuristic influence score that infers the strength and sign of regulation (up or down) between genes. In these experiments, we found that our inference algorithm worked best when presented with data discretized into three categories, when using a greedy search algorithm with random restarts, and when evaluating networks using the BDe scoring metric. Under these conditions, the algorithm was both accurate and efficient in recovering the simulated molecular network when the sampled data sets were large. Under more biologically reasonable small amounts of sampled data, the algorithm worked best only when interpolated data was included, but had difficulty recovering relationships describing genes with more than one regulatory influence. These results suggest that network inference algorithms and sampling methods must be carefully designed and tested before they can be used to recover biological genetic pathways, especially in the context of highly limited quantities of data.
2012
Enabled by recent advances in bioinformatics, the inference of gene regulatory networks (GRNs) from gene expression data has garnered much interest from researchers. This is due to the need of researchers to understand the dynamic behavior and uncover the vast information lay hidden within the networks. In this regard, dynamic Bayesian network (DBN) is extensively used to infer GRNs due to its ability to handle time-series microarray data and modeling feedback loops. However, the efficiency of DBN in inferring GRNs is often hampered by missing values in expression data, and excessive computation time due to the large search space whereby DBN treats all genes as potential regulators for a target gene. In this paper, we proposed a DBN-based model with missing values imputation to improve inference efficiency, and potential regulators detection which aims to lessen computation time by limiting potential regulators based on expression changes. The performance of the proposed model is assessed by using time-series expression data of yeast cell cycle. The experimental results et al.
Causal inference in biology networks with integrated belief propagation
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.
Bioinformatics, 2011
Motivation: Reverse engineering gene regulatory networks, especially large size networks from time series gene expression data, remain a challenge to the systems biology community. In this article, a new hybrid algorithm integrating ordinary differential equation models with dynamic Bayesian network analysis, called Differential Equation-based Local Dynamic Bayesian Network (DELDBN), was proposed and implemented for gene regulatory network inference. Results: The performance of DELDBN was benchmarked with an in vivo dataset from yeast. DELDBN significantly improved the accuracy and sensitivity of network inference compared with other approaches. The local causal discovery algorithm implemented in DELDBN also reduced the complexity of the network inference algorithm and improved its scalability to infer larger networks. We have demonstrated the applicability of the approach to a network containing thousands of genes with a dataset from human HeLa cell time series experiments. The loc...
Causal Computational Models for Gene Regulatory Networks
2016
Gene Regulatory Networks (GRNs) hold the key to understanding and solving many problems in biological sciences, with critical applications in medicine and therapeutics. However, discovering GRNs in the laboratory is a cumbersome and tricky affair, since the number of genes and interactions, say in a mammalian cell, are very large. We aim to discover these GRNs computationally, by using gene expression levels as a “time-series” dataset. We research and employ techniques from probability and information theory, theory of dynamical systems, and graph structure estimation, to establish causal relations between genes, on synthetic datasets. Furthermore, we suggest methods for global estimation of gene networks. Therefore, narrowing the space of genetic interactions to be looked at when discovering these GRNs in the lab.
2007
In this work, we address both the computational and modeling aspects of Bayesian network structure learning. Several recent algorithms can handle large networks by operating on the space of variable orderings, but for technical reasons they cannot compute many interesting structural features and require the use of a restrictive prior. We introduce a novel MCMC method that utilizes the deterministic output of the exact structure learning algorithm of Koivisto and Sood to construct a fast-mixing proposal on the space of DAGs. We show that in addition to fixing the order-space algorithms’ shortcomings, our method outperforms other existing samplers on real datasets by delivering more accurate structure and higher predictive likelihoods in less compute time. Next, we discuss current models of intervention and propose a novel approach named the uncertain intervention model, whereby the targets of an intervention can be learned in parallel to the graph’s causal structure. We validate our ...
Reconstructing Causal Biological Networks through Active Learning
PLOS ONE, 2016
Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.
SIMULATION, 2003
Learning regulatory interactions between genes from microarray measurements presents one of the major challenges in functional genomics. This article studies the suitability of learning dynamic Bayesian networks under realistic experimental settings. Through extensive artificial-data experiments, it is investigated how the performance of discovering the true interactions depends on varying data conditions. These experiments show that the performance most strongly deteriorates when the connectivity of the original network increases, and more than a proportional increase in the number of samples is needed to compensate for this. Furthermore, it was found that a lower performance is achieved when the original network size becomes larger, but this decrease can be greatly reduced with increased computational effort. Finally, it is shown that the performance of the search algorithm benefits more from a larger number of restarts rather than from the use of more sophisticated search strategies.