Transcriptional regulatory network refinement and quantification through kinetic modeling, gene expression microarray data and information theory (original) (raw)

Microarray Analysis through Transcription Kinetic Modeling and Information Theory

cDNA microarray and other multiplex data hold promise for addressing the challenges of cellular complexity, disease progression and drug discovery. We believe that combining transcription kinetic modeling with microarray time series data through information theory will yield more information about the gene regulatory networks than obtained previously. A novel analysis of gene regulatory networks is presented based on the integration of microarray data and cell modeling through information theory. Given a partial network and time series data, a probability density is constructed that is a functional of the time course of intra-nuclear transcription factor (TF) thermodynamic activities, and is a function of RNA degradation and transcription rate and equilibrium constants for TF/gene binding. The most probable TF time courses and the values of aforementioned parameters are computed. Accuracy and robustness of the method are evaluated and an application to Escherichia Coli is demonstrated. A kinetic (and not a steady state) formulation allows us to analyze phenomena with a strongly dynamical character (e.g. the cell cycle, metabolic oscillations, viral infection or response to changes in the extra-cellular medium).

Integrative analysis of time course microarray data and DNA sequence data via log-linear models for identifying dynamic transcriptional regulatory networks

International journal of data mining and bioinformatics, 2013

Since eukaryotic transcription is regulated by sets of Transcription Factors (TFs) having various transcriptional time delays, identification of temporal combinations of activated TFs is important to reconstruct Transcriptional Regulatory Networks (TRNs). Our methods combine time course microarray data, information on physical binding between the TFs and their targets and the regulatory sequences of genes using a log-linear model to reconstruct dynamic functional TRNs of the yeast cell cycle and human apoptosis. In conclusion, our results suggest that the proposed dynamic motif search method is more effective in reconstructing TRNs than the static motif search method.

Inferring quantitative models of regulatory networks from expression data

Bioinformatics, 2004

Motivation: Genetic networks regulate key processes in living cells. Various methods have been suggested to reconstruct network architecture from gene expression data. However, most approaches are based on qualitative models that provide only rough approximations of the underlying events, and lack the quantitative aspects that are critical for understanding the proper function of biomolecular systems. Results: We present fine-grained dynamical models of gene transcription and develop methods for reconstructing them from gene expression data within the framework of a generative probabilistic model. Unlike previous works, we employ quantitative transcription rates, and simultaneously estimate both the kinetic parameters that govern these rates, and the activity levels of unobserved regulators that control them. We apply our approach to expression datasets from yeast and show that we can learn the unknown regulator activity profiles, as well as the binding affinity parameters. We also introduce a novel structure learning algorithm, and demonstrate its power to accurately reconstruct the regulatory network from those datasets. Contact: nir@cs.huji.ac.il i248 Bioinformatics 20(Suppl. 1)

Unraveling gene regulatory networks from time-resolved gene expression data -- a measures comparison study

BMC Bioinformatics, 2011

Background: Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications. Results: Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as informationtheoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study. Conclusions: Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.

Gene Regulatory Networks: A Primer in Biological Processes and Statistical Modelling

Methods in Molecular Biology, 2018

Modelling gene regulatory networks not only requires a thorough understanding of the biological system depicted but also the ability to accurately represent this system from a mathematical perspective. Throughout this chapter, we aim to familiarise the reader with the biological processes and molecular factors at play in the process of gene expression regulation. We first describe the different interactions controlling each step of the expression process, from transcription to mRNA and protein decay. In the second section, we provide statistical tools to accurately represent this biological complexity in the form of mathematical models. Amongst other considerations, we discuss the topological properties of biological networks, the application of deterministic and stochastic frameworks and the quantitative modelling of regulation. We particularly focus on the use of such models for the simulation of expression data that can serve as a benchmark for the testing of network inference algorithms.

Dynamical Analysis of Gene Networks Requires Both mRNA and Protein Expression Information

Metabolic Engineering, 1999

One of the important goals of biology is to understand the relationship between DNA sequence information and nonlinear cellular responses. This relationship is central to the ability to effectively engineer cellular phenotypes, pathways, and characteristics. Expression arrays for monitoring total gene expression based on mRNA can provide quantitative insight into which gene or genes are on or off; but this information is insufficient to fully predict dynamic biological phenomena. Using nonlinear stability analysis we show that a combination of gene expression information at the message level and at the protein level is required to describe even simple models of gene networks. To help illustrate the need for such information we consider a mechanistic model for circadian rhythmicity which shows agreement with experimental observations when protein and mRNA information are included and we propose a framework for acquiring and analyzing experimental and mathematically derived information about gene networks.

Gene network reconstruction from transcriptional dynamics under kinetic model uncertainty: a case for the second derivative

Bioinformatics, 2009

Motivation: Measurements of gene expression over time enable the reconstruction of transcriptional networks. However, Bayesian networks and many other current reconstruction methods rely on assumptions that conflict with the differential equations that describe transcriptional kinetics. Practical approximations of kinetic models would enable inferring causal relationships between genes from expression data of microarray, tag-based, and conventional platforms, but conclusions are sensitive to the assumptions made. Results: The representation of a sufficiently large portion of genome enables computation of an upper bound on how much confidence one may place in influences between genes on the basis of expression data. Information about which genes encode transcription factors is not necessary but may be incorporated if available. The methodology is generalized to cover cases in which expression measurements are missing for many of the genes that might control the transcription of the genes of interest. The assumption that the gene expression level is roughly proportional to the rate of translation led to better empirical performance than did either the assumption that the gene expression level is roughly proportional to the protein level or the Bayesian model average of both assumptions. Contact: dbickel@uottawa.ca Availability: {{http://www.oisb.ca}} points to R code implementing the methods (R Development Core Team 2004).

Inferring transcriptional regulatory networks from high-throughput data

Bioinformatics, 2007

Motivation: Inferring the relationships between transcription factors (TFs) and their targets has utmost importance for understanding the complex regulatory mechanisms in cellular systems. However, the transcription factor activities (TFAs) cannot be measured directly by standard microarray experiment owing to various posttranslational modifications. In particular, cooperative mechanism and combinatorial control are common in gene regulation, e.g. TFs usually recruit other proteins cooperatively to facilitate transcriptional reaction processes. Results: In this article, we propose a novel method for inferring transcriptional regulatory networks (TRN) from gene expression data based on protein transcription complexes and mass action law. With gene expression data and TFAs estimated from transcription complex information, the inference of TRN is formulated as a linear programming (LP) problem which has a globally optimal solution in terms of L 1 norm error. The proposed method not only can easily incorporate ChIP-Chip data as prior knowledge, but also can integrate multiple gene expression datasets from different experiments simultaneously. A unique feature of our method is to take into account protein cooperation in transcription process. We tested our method by using both synthetic data and several experimental datasets in yeast. The extensive results illustrate the effectiveness of the proposed method for predicting transcription regulatory relationships between TFs with co-regulators and target genes. Availability: The software TRNinfer is available from