Biomathematics and Statistics Scotland, Edinburgh, UK (original) (raw)

Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks

Bioinformatics/computer Applications in The Biosciences, 2006

Motivation: An important problem in systems biology is the inference of biochemical pathways and regulatory networks from postgenomic data. Various reverse engineering methods have been proposed in the literature, and it is important to understand their relative merits and shortcomings. In the present paper, we compare the accuracy of reconstructing gene regulatory networks with three different modelling and inference paradigms: (1) Relevance networks (RNs): pairwise association scores independent of the remaining network; (2) graphical Gaussian models (GGMs): undirected graphical models with constraint-based inference, and (3) Bayesian networks (BNs): directed graphical models with score-based inference. The evaluation is carried out on the Raf pathway, a cellular signalling network describing the interaction of 11 phosphorylated proteins and phospholipids in human immune system cells. We use both laboratory data from cytometry experiments as well as data simulated from the gold-standard network. We also compare passive observations with active interventions. Results: On Gaussian observational data, BNs and GGMs were found to outperform RNs. The difference in performance was not significant for the non-linear simulated data and the cytoflow data, though. Also, we did not observe a significant difference between BNs and GGMs on observational data in general. However, for interventional data, BNs outperform GGMs and RNs, especially when taking the edge directions rather than just the skeletons of the graphs into account. This suggests that the higher computational costs of inference with BNs over GGMs and RNs are not justified when using only passive observations, but that active interventions in the form of gene knockouts and over-expressions are required to exploit the full potential of BNs. Availability: Data, software and supplementary material are available from http://www.

Reverse engineering gene and protein regulatory networks using graphical models: A comparative evaluation study

graphical models machine learning methods, namely Relevance networks, Gaussian Graphical Models, and Bayesian networks, are cross-compared on real cytometric protein data and sim- ulated data from the RAF signalling pathway. Relevance networks are based on pairwise association scores and straightforward to implement. But the infer- ence is not done in the context of the whole system and there is no possibility to distinguished between direct and indirect associations. Both shortcomings are addressed by Gaussian graphical models, where the partial correlation be- tween two variables, conditional on all the other domain variables, is employed as association score. Bayesian networks are more flexible probabilistic graph- ical models for conditional dependence and independence relations. Bayesian networks are based on directed acyclic graphs and can be exploited to analyse interventional data for identifying putative causal interactions. The empiri- cal results were obtained by applying...

Reverse engineering gene regulatory networks

IEEE Signal Processing Magazine, 2000

Gene regulatory networks are collections of genes that interact with one other and with other substances in the cell. By measuring gene expression over time using high-throughput technologies, it may be possible to reverse engineer, or infer, the structure of the gene network involved in a particular cellular process. These gene expression data typically have a high dimensionality and a limited number of biological replicates and time points. Due to these issues and the complexity of biological systems, the problem of reverse engineering networks from gene expression data demands a specialized suite of statistical tools and methodologies. We propose a non-standard adaptation of a simulationbased approach known as Approximate Bayesian Computing based on Markov chain Monte Carlo sampling. This approach is particularly well suited for the inference of gene regulatory networks from longitudinal data. The performance of this approach is investigated via simulations and using longitudinal expression data from a genetic repair system in Escherichia coli.

Using Qualitative Probability in Reverse-Engineering Gene Regulatory Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2000

This paper demonstrates the use of qualitative probabilistic networks (QPNs) to aid Dynamic Bayesian Networks (DBNs) in the process of learning the structure of gene regulatory networks from microarray gene expression data. We present a study which shows that QPNs define monotonic relations that are capable of identifying regulatory interactions in a manner that is less susceptible to the many sources of uncertainty that surround gene expression data. Moreover, we construct a model that maps the regulatory interactions of genetic networks to QPN constructs and show its capability in providing a set of candidate regulators for target genes, which is subsequently used to establish a prior structure that the DBN learning algorithm can use and which 1) distinguishes spurious correlations from true regulations, 2) enables the discovery of sets of coregulators of target genes, and 3) results in a more efficient construction of gene regulatory networks. The model is compared to the existing literature using the known gene regulatory interactions of Drosophila Melanogaster.

Reverse engineering of regulatory networks: simulation studies on a genetic algorithm approach for ranking hypotheses

Biosystems, 2002

Reverse engineering algorithms (REAs) aim at using gene expression data to reconstruct interactions in regulatory genetic networks. This may help to understand the basis of gene regulation, the core task of functional genomics. Collecting data for a number of environmental conditions is necessary to reengineer even the smallest regulatory networks with reasonable confidence. We systematically tested the requirements for the experimental design necessary for ranking alternative hypotheses about the structure of a given regulatory network. A genetic algorithm (GA) was used to explore the parameter space of a multistage discrete genetic network model with fixed connectivity and number of states per node. Our results show that it is not necessary to determine all parameters of the genetic network in order to rank hypotheses. The ranking process is easier the more experimental environmental conditions are used for the data set. During the ranking, the number of fixed parameters increases with the number of environmental conditions, while some errors in the hypothetical network structure may pass undetected, due to a maintained dynamical behaviour.

A survey of models for inference of gene regulatory networks

Nonlinear Analysis: Modelling and Control, 2013

In this article, I present the biological backgrounds of microarray, ChIP-chip and ChIPSeq technologies and the application of computational methods in reverse engineering of gene regulatory networks (GRNs). The most commonly used GRNs models based on Boolean networks, Bayesian networks, relevance networks, differential and difference equations are described. A novel model for integration of prior biological knowledge in the GRNs inference is presented, too. The advantages and disadvantages of the described models are compared. The GRNs validation criteria are depicted. Current trends and further directions for GRNs inference using prior knowledge are given at the end of the paper.

Evolutionary approaches for the reverse-engineering of gene regulatory networks: A study on a biologically realistic dataset

BMC Bioinformatics, 2008

Background: Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge.

Reverse Engineering Gene Regulatory Networks with Various Machine Learning Methods

Wiley-VCH Verlag GmbH & Co. KGaA eBooks, 2008

Background: To infer gene regulatory networks from time series gene profiles, two important tasks that are related to biological systems must be undertaken. One task is to determine a valid network structure that has topological properties that can influence the network dynamics profoundly. The other task is to optimize the network parameters to minimize the accumulated discrepancy between the gene expression data and the values produced by the inferred network model. Though the above two tasks must be conducted simultaneously, most existing work addresses only one of the tasks. Results: We propose an iterative approach that couples parameter identification and parameter optimization techniques, to address the two tasks simultaneously during network inference. This approach first identifies the most influential parameters against internal perturbations; this identification is based on sensitivity measurements. Then, a hybrid GA-PSO optimization method infers parameters in accordance with their criticalities. The proposed approach has been applied to several datasets, including subsets of the SOS DNA repair system in E. coli, the Rat central nervous system (CNS), and the protein glycosylation system of yeast S. cerevisiae. The result and analysis show that our approach can infer solutions to satisfy both the requirements of network structure and network behavior. Conclusions: Network structure is an important though challenging issue to address in inferring sophisticated networks with biological details. In need of prior structural knowledge, we turn to measure parameter sensitivity instead to account for the network structure in an indirect way. By developing an integrated approach for considering both the network structure and behavior in the inference process, we can successfully infer critical gene interactions as well as valid time expression profiles.