Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data (original) (raw)

Modeling Perturbations using Gene Networks

2010

External factors such as radiation, drugs or chemotherapy can alter the expressions of a subset of genes. We call these genes the primarily affected genes. Primarily affected genes eventually can change the expressions of other genes as they activate/suppress them through interactions. Measuring the gene expressions before and after applying an external factor (i.e., perturbation) in microarray experiments can reveal how the expression of each gene changes. It however can not identify the cause of the change.

Detecting the Presence and Absence of Causal Relationships between Expression of Yeast Genes with Very Few Samples

Journal of Computational Biology, 2010

Inference of biological networks from high-throughput data is a central problem in bioinformatics. Particularly powerful for network reconstruction is data collected by recent studies that contain both genetic variation information and gene expression profiles from genetically distinct strains of an organism. Various statistical approaches have been applied to these data to tease out the underlying biological networks that govern how individual genetic variation mediates gene expression and how genes regulate and interact with each other. Extracting meaningful causal relationships from these networks remains a challenging but important problem. In this article, we use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. We evaluate our method using a well studied dataset consisting of both genetic variations and gene expressions collected over randomly segregated yeast strains. Our predictions of causal regulators, genes that control the expression of a large number of target genes, are consistent with previously known experimental evidence. In addition, our method can detect the absence of causal relationships and can distinguish between direct and indirect effects of variation on a gene expression level. Extracting meaningful causal relationships from these networks has been a challenging but important area of genetical genomics. What differs genetical genomics studies from traditional microarray analysis and what makes causal inference possible is the idea to model genetic variations as random perturbations to the underlying regulatory network. A principled way of representing the causal relationships in a biologicla network is using graphical causal models (Pearl, 1988. Such models represent causal relationships between random variables by means of a directed acyclic graph called a causal graph, where a directed edge between two variables represents direct causal influence. The data-generating process represented by a causal graph imposes a variety of constraints, such as conditional independence constraints, on the observed data. A rich theory of causal inference has been developed which attempts to reconstruct aspects of the graph from the pattern of constraints in the observations. Causal relationships can then be read off directly from the reconstructed graph.

Influence of the experimental design of gene expression studies on the inference of gene regulatory networks: environmental factors

PeerJ, 2013

The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I) observational gene expression data: normal environmental condition, (II) interventional gene expression data: growth in rich media, (III) interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.

Causal Computational Models for Gene Regulatory Networks

2016

Gene Regulatory Networks (GRNs) hold the key to understanding and solving many problems in biological sciences, with critical applications in medicine and therapeutics. However, discovering GRNs in the laboratory is a cumbersome and tricky affair, since the number of genes and interactions, say in a mammalian cell, are very large. We aim to discover these GRNs computationally, by using gene expression levels as a “time-series” dataset. We research and employ techniques from probability and information theory, theory of dynamical systems, and graph structure estimation, to establish causal relations between genes, on synthetic datasets. Furthermore, we suggest methods for global estimation of gene networks. Therefore, narrowing the space of genetic interactions to be looked at when discovering these GRNs in the lab.

Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks

Biocomputing 2001, 2000

We propose a model-driven approach for analyzing genomic expression data that permits genetic regulatory networks to be represented in a biologically interpretable computational form. Our models permit latent variables capturing unobserved factors, describe arbitrarily complex (more than pair-wise) relationships at varying levels of refinement, and can be scored rigorously against observational data. The models that we use are based on Bayesian networks and their extensions. As a demonstration of this approach, we utilize 52 genomes worth of Affymetrix GeneChip expression data to correctly differentiate between alternative hypotheses of the galactose regulatory network in S. cerevisiae. When we extend the graph semantics to permit annotated edges, we are able to score models describing relationships at a finer degree of specification.

Inferring the regulatory network behind a gene expression experiment

Nucleic Acids Research, 2012

Transcription factors (TFs) and miRNAs are the most important dynamic regulators in the control of gene expression in multicellular organisms. These regulatory elements play crucial roles in development, cell cycling and cell signaling, and they have also been associated with many diseases. The Regulatory Network Analysis Tool (RENATO) web server makes the exploration of regulatory networks easy, enabling a better understanding of functional modularity and network integrity under specific perturbations. RENATO is suitable for the analysis of the result of expression profiling experiments. The program analyses lists of genes and search for the regulators compatible with its activation or deactivation. Tests of single enrichment or gene set enrichment allow the selection of the subset of TFs or miRNAs significantly involved in the regulation of the query genes. RENATO also offers an interactive advanced graphical interface that allows exploring the regulatory network found.RENATO is available at: http://renato.bioinfo .cipf.es/.

A causal inference approach for constructing transcriptional regulatory networks

Bioinformatics, 2005

Motivation: Transcriptional regulatory networks specify the interactions among regulatory genes and between regulatory genes and their target genes. Discovering transcriptional regulatory networks helps us to understand the underlying mechanism of complex cellular processes and responses. Method: This paper describes a causal inference approach for constructing transcriptional regulatory networks using gene expression data, promoter sequences and information on transcription factor (TF) binding sites. The method first identifies active TFs in each individual experiment using a feature selection approach. TFs are viewed as 'treatments' and gene expression levels as 'responses'. For every TF and gene pair, a marginal structural model is built to estimate the causal effect of the TF on the expression level of the gene. The model parameters can be estimated using the G-computation procedure or the IPTW estimator. The P-value associated with the causal parameter in each of these models is used to measure how strongly a TF regulates a gene. These results are further used to infer the overall regulatory network structures. Results: Our analysis of yeast data suggests that the method is capable of identifying significant transcriptional regulatory interactions and the corresponding regulatory networks. Availability: The software is under development.

Experiments on the Accuracy of Algorithms for Inferring the Structure of Genetic Regulatory Networks from Microarray Expression Levels

After reviewing theoretical reasons for doubting that machine learning methods can accurately infer gene regulatory networks from microarray data, we test 10 algorithms on simulated data from the sea urchin network, and on microarray data for yeast compared with recent experimental determinations of the regulatory network in the same yeast species. Our results agree with the theoretical arguments: most algorithms are at chance for determining the existence of a regulatory connection between gene pairs, and the algorithms that perform better than chance are nonetheless so errorprone as to be of little practical use in these applications.

Applying dynamic Bayesian networks to perturbed gene expression data

BMC Bioinformatics, 2006

Background: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. Results: We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. Conclusion: We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.

Using Bayesian networks to analyze expression data

2000

DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes, and since they provide clear methodologies for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).