Model-based Inference of Gene Expression Dynamics from Sequence Information (original) (raw)

Dynamic modeling of gene expression in prokaryotes: application to glucose-lactose diauxie in Escherichia coli

Systems and synthetic biology, 2011

Coexpression of genes or, more generally, similarity in the expression profiles poses an unsurmountable obstacle to inferring the gene regulatory network (GRN) based solely on data from DNA microarray time series. Clustering of genes with similar expression profiles allows for a course-grained view of the GRN and a probabilistic determination of the connectivity among the clusters. We present a model for the temporal evolution of a gene cluster network which takes into account interactions of gene products with genes and, through a non-constant degradation rate, with other gene products. The number of model parameters is reduced by using polynomial functions to interpolate temporal data points. In this manner, the task of parameter estimation is reduced to a system of linear algebraic equations, thus making the computation time shorter by orders of magnitude. To eliminate irrelevant networks, we test each GRN for stability with respect to parameter variations, and impose restriction...

A Continuous Model of Gene Expression

2005

Gene expression is the process by which a gene makes its effect on a cell or organism. Linear differential equations have been explored as a model for gene expression. We discuss the shortcomings of this model, and we propose a system of nonlinear differential equations to mathematically model gene expression in prokaryotes, specifically bacteria. We investigate this biological system using explicit functions that describe the processes of protein synthesis which includes transcription, translation, degradation, and feedback in hope of shedding light on their associated rates. We analyze the transient and steady state solutions of the model and give a biological interpretation of these results.

[Modeling evolution of regulatory signals for gene expression in bacteria]

Molekuliarnaia biologiia

A model of evolution of a regulatory signal along the phylogenetic tree of species taking into account the secondary RNA structure is suggested. Based on this model, an algorithm is presented. It inputs the extant primary structure of a signal for the leaves of the phylogenetic tree and computes the primary and secondary structures of all the nodes. Another result of the algorithm is a multiple alignment of extant sites of a regulatory signal taking into account the secondary structure of the signal. The algorithm ha s been implemented and successfully tested on biological data representing three types of regulation in bacteria.

Dynamic models of gene expression and classification

Functional & Integrative Genomics, 2001

Powerful new methods, like expression profiles using cDNA arrays, have been used to monitor changes in gene expression levels as a result of a variety of metabolic, xenobiotic or pathogenic challenges. This potentially vast quantity of data enables, in principle, the dissection of the complex genetic networks that control the patterns and rhythms of gene expression in the cell. Here we present a general approach to developing dynamic models for analyzing time series of whole genome expression. In this approach, a self-consistent calculation is performed that involves both linear and non-linear response terms for interrelating gene expression levels. This calculation uses singular value decomposition (SVD) not as a statistical tool but as a means of inverting noisy and near-singular matrices. The linear transition matrix that is determined from this calculation can be used to calculate the underlying network reflected in the data. This suggests a direct method of classifying genes according to their place in the resulting network. In addition to providing a means to model such a large multivariate system this approach can be used to reduce the dimensionality of the problem in a rational and consistent way, and suppress the strong noise amplification effects often encountered with expression profile data. Non-linear and higher-order Markov behavior of the network are also determined in this self-consistent method. In data sets from yeast, we calculate the Markov matrix and the gene classes based on the linear-Markov network. These results compare favorably with previously used methods like cluster analysis. Our dynamic method appears to give a broad and general framework for data analysis and modeling of gene expression arrays.

A simple framework to describe the regulation of gene expression in prokaryotes

Comptes Rendus Biologies, 2005

Based on the bimolecular mass action law and the derived mass conservation laws, we propose a mathematical framework in order to describe the regulation of gene expression in prokaryotes. It is shown that the derived models have all the qualitative properties of the activation and inhibition regulatory mechanisms observed in experiments. The basic construction considers genes as templates for protein production, where regulation processes result from activators or repressors connecting to DNA binding sites. All the parameters in the models have a straightforward biological meaning. After describing the general properties of the basic mechanisms of positive and negative gene regulation, we apply this framework to the self-regulation of the trp operon and to the genetic switch involved in the regulation of the lac operon. One of the consequences of this approach is the existence of conserved quantities depending on the initial conditions that tune bifurcations of fixed points. This leads naturally to a simple explanation of threshold effects as observed in some experiments. To cite this article: F. Alves, R. Dilão, C. R. Biologies 328 (2005).  2005 Académie des sciences. Published by Elsevier SAS. All rights reserved.

Inference of Quantitative Models of Bacterial Promoters from Time-Series Reporter Gene Data

PLOS Computational Biology, 2015

The inference of regulatory interactions and quantitative models of gene regulation from time-series transcriptomics data has been extensively studied and applied to a range of problems in drug discovery, cancer research, and biotechnology. The application of existing methods is commonly based on implicit assumptions on the biological processes under study. First, the measurements of mRNA abundance obtained in transcriptomics experiments are taken to be representative of protein concentrations. Second, the observed changes in gene expression are assumed to be solely due to transcription factors and other specific regulators, while changes in the activity of the gene expression machinery and other global physiological effects are neglected. While convenient in practice, these assumptions are often not valid and bias the reverse engineering process. Here we systematically investigate, using a combination of models and experiments, the importance of this bias and possible corrections. We measure in real time and in vivo the activity of genes involved in the FliA-FlgM module of the E. coli motility network. From these data, we estimate protein concentrations and global physiological effects by means of kinetic models of gene expression. Our results indicate that correcting for the bias of commonly-made assumptions improves the quality of the models inferred from the data. Moreover, we show by simulation that these improvements are expected to be even stronger for systems in which protein concentrations have longer half-lives and the activity of the gene expression machinery varies more strongly across conditions than in the FliA-FlgM module. The approach proposed in this study is broadly applicable when using time-series transcriptome data to learn about the structure and dynamics of regulatory networks. In the case of the FliA-FlgM module, our results demonstrate the importance of global physiological effects and the active regulation of FliA and FlgM half-lives for the dynamics of FliA-dependent promoters.

Analysis of a simple gene expression model

2012

Gene expression is random owing to the low copy numbers of molecules in a living cell and the best way to study it is by use of a stochastic method, specifically the chemical master equation. The method is used here to derive analytically the invariant probability distributions, and expressions for the moments and noise strength for a simple gene model without feedback. Sensitivity analysis, emphasizing particularly the dependence of the probability distributions, the moments, and noise strength is carried out using Metabolic Control Analysis, which uses control coefficients that measure the response of observables when parameters change. Bifurcation analysis is also carried out. The results show that the number of mRNA molecules follows a hypergeometric probability distribution, and that noise decreases as the number of these molecules increases. Metabolic Control Analysis was successfully extended to genetic control mechanisms, with the obtained control coefficients satisfying a s...

Modeling Gene Expression with Differential Equations

1999

We propose a di erential equation model for gene expression and provide two methods to construct the model from a set of temporal data. We model both transcription and translation by kinetic equations with feedback loops from translation products to transcription. Degradation of proteins and mRNAs is also incorporated. We study two methods to construct the model from experimental data: Minimum Weight Solutions to Linear Equations (MWSLE), which determines the regulation by solving under-determined linear equations, and Fourier Transform for Stable Systems (FTSS), which re nes the model with cell cycle constraints. The results suggest that a minor set of temporal data may be su cient t o c o nstruct the model at the genome level. We also give a comprehensive discussion of other extended models: the RNA Model, the Protein Model, and the Time Delay Model.

Continuous-Time Identification of Gene Expression Models

OMICS: A Journal of Integrative Biology, 2003

One objective of systems biology is to create predictive, quantitative models of the transcriptional regulation networks that govern numerous cellular processes. Gene expression measurements, as provided by microarrays, are commonly used in studies that attempt to infer the regulation underlying these processes. At present, most gene expression models that have been derived from microarray data are based in discrete-time, which have limited applicability to common biological data sets, and may impede the integration of gene expression models with other models of biological processes that are formulated as ordinary differential equations (ODEs). To overcome these difficulties, a continuous-time approach for process identification to identify gene expression models based in ODEs was developed. The approach utilizes the modulating functions method of parameter identification. The method was applied to three simulated systems: (1) a linear gene expression model, (2) an autoregulatory gene expression model, and (3) simulated microarray data from a nonlinear transcriptional network. In general, the approach was well suited for identifying models of gene expression dynamics, capable of accurately identifying parameters for small numbers of data samples in the presence of modest experimental noise. Additionally, numerous insights about gene expression modeling were revealed by the case studies.

Statistical modelling of transcript profiles of differentially regulated genes

BMC molecular …, 2008

The vast quantities of gene expression profiling data produced in microarray studies, and the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of variance (ANOVA) and the clustering of genes based on simple models fitted to their expression profiles over time. We report the novel application of statistical non-linear regression modelling techniques to describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E. coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models provides a more precise description of expression profiles, reducing the "noise" of the raw data to produce a clear "signal" given by the fitted curve, and describing each profile with a small number of biologically interpretable parameters. This approach then allows the direct comparison and clustering of the shapes of response patterns between genes and potentially enables a greater exploration and interpretation of the biological processes driving gene expression. Results: Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Split-line" or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification of genes into those with primary and secondary responses. Five-day profiles were modelled using the biologically-oriented, critical exponential curve, y(t) = A + (B + Ct)Rt + ε. This non-linear regression approach allowed the expression patterns for different genes to be compared in terms of curve shape, time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory patterns were identified for the five genes studied. Applying the regression modelling approach to microarray-derived time course data allowed 11% of the Escherichia coli features to be fitted by an exponential function, and 25% of the Rattus norvegicus features could be described by the critical exponential model, all with statistical significance of p < 0.05. Conclusion: The statistical non-linear regression approaches presented in this study provide detailed biologically oriented descriptions of individual gene expression profiles, using biologically variable data to generate a set of defining parameters. These approaches have application to the modelling and greater interpretation of profiles obtained across a wide range of platforms, such as microarrays. Through careful choice of appropriate model forms, such statistical regression approaches allow an improved comparison of gene expression profiles, and may provide an approach for the greater understanding of common regulatory mechanisms between genes.