Model-based Inference of Gene Expression Dynamics from Sequence Information (original) (raw)

Inference of Quantitative Models of Bacterial Promoters from Time-Series Reporter Gene Data

PLOS Computational Biology, 2015

The inference of regulatory interactions and quantitative models of gene regulation from time-series transcriptomics data has been extensively studied and applied to a range of problems in drug discovery, cancer research, and biotechnology. The application of existing methods is commonly based on implicit assumptions on the biological processes under study. First, the measurements of mRNA abundance obtained in transcriptomics experiments are taken to be representative of protein concentrations. Second, the observed changes in gene expression are assumed to be solely due to transcription factors and other specific regulators, while changes in the activity of the gene expression machinery and other global physiological effects are neglected. While convenient in practice, these assumptions are often not valid and bias the reverse engineering process. Here we systematically investigate, using a combination of models and experiments, the importance of this bias and possible corrections. We measure in real time and in vivo the activity of genes involved in the FliA-FlgM module of the E. coli motility network. From these data, we estimate protein concentrations and global physiological effects by means of kinetic models of gene expression. Our results indicate that correcting for the bias of commonly-made assumptions improves the quality of the models inferred from the data. Moreover, we show by simulation that these improvements are expected to be even stronger for systems in which protein concentrations have longer half-lives and the activity of the gene expression machinery varies more strongly across conditions than in the FliA-FlgM module. The approach proposed in this study is broadly applicable when using time-series transcriptome data to learn about the structure and dynamics of regulatory networks. In the case of the FliA-FlgM module, our results demonstrate the importance of global physiological effects and the active regulation of FliA and FlgM half-lives for the dynamics of FliA-dependent promoters.

Analysis of a simple gene expression model

2012

Gene expression is random owing to the low copy numbers of molecules in a living cell and the best way to study it is by use of a stochastic method, specifically the chemical master equation. The method is used here to derive analytically the invariant probability distributions, and expressions for the moments and noise strength for a simple gene model without feedback. Sensitivity analysis, emphasizing particularly the dependence of the probability distributions, the moments, and noise strength is carried out using Metabolic Control Analysis, which uses control coefficients that measure the response of observables when parameters change. Bifurcation analysis is also carried out. The results show that the number of mRNA molecules follows a hypergeometric probability distribution, and that noise decreases as the number of these molecules increases. Metabolic Control Analysis was successfully extended to genetic control mechanisms, with the obtained control coefficients satisfying a s...

Modeling Gene Expression with Differential Equations

1999

We propose a di erential equation model for gene expression and provide two methods to construct the model from a set of temporal data. We model both transcription and translation by kinetic equations with feedback loops from translation products to transcription. Degradation of proteins and mRNAs is also incorporated. We study two methods to construct the model from experimental data: Minimum Weight Solutions to Linear Equations (MWSLE), which determines the regulation by solving under-determined linear equations, and Fourier Transform for Stable Systems (FTSS), which re nes the model with cell cycle constraints. The results suggest that a minor set of temporal data may be su cient t o c o nstruct the model at the genome level. We also give a comprehensive discussion of other extended models: the RNA Model, the Protein Model, and the Time Delay Model.

Continuous-Time Identification of Gene Expression Models

OMICS: A Journal of Integrative Biology, 2003

One objective of systems biology is to create predictive, quantitative models of the transcriptional regulation networks that govern numerous cellular processes. Gene expression measurements, as provided by microarrays, are commonly used in studies that attempt to infer the regulation underlying these processes. At present, most gene expression models that have been derived from microarray data are based in discrete-time, which have limited applicability to common biological data sets, and may impede the integration of gene expression models with other models of biological processes that are formulated as ordinary differential equations (ODEs). To overcome these difficulties, a continuous-time approach for process identification to identify gene expression models based in ODEs was developed. The approach utilizes the modulating functions method of parameter identification. The method was applied to three simulated systems: (1) a linear gene expression model, (2) an autoregulatory gene expression model, and (3) simulated microarray data from a nonlinear transcriptional network. In general, the approach was well suited for identifying models of gene expression dynamics, capable of accurately identifying parameters for small numbers of data samples in the presence of modest experimental noise. Additionally, numerous insights about gene expression modeling were revealed by the case studies.

Statistical modelling of transcript profiles of differentially regulated genes

BMC molecular …, 2008

The vast quantities of gene expression profiling data produced in microarray studies, and the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of variance (ANOVA) and the clustering of genes based on simple models fitted to their expression profiles over time. We report the novel application of statistical non-linear regression modelling techniques to describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E. coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models provides a more precise description of expression profiles, reducing the "noise" of the raw data to produce a clear "signal" given by the fitted curve, and describing each profile with a small number of biologically interpretable parameters. This approach then allows the direct comparison and clustering of the shapes of response patterns between genes and potentially enables a greater exploration and interpretation of the biological processes driving gene expression. Results: Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Split-line" or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification of genes into those with primary and secondary responses. Five-day profiles were modelled using the biologically-oriented, critical exponential curve, y(t) = A + (B + Ct)Rt + ε. This non-linear regression approach allowed the expression patterns for different genes to be compared in terms of curve shape, time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory patterns were identified for the five genes studied. Applying the regression modelling approach to microarray-derived time course data allowed 11% of the Escherichia coli features to be fitted by an exponential function, and 25% of the Rattus norvegicus features could be described by the critical exponential model, all with statistical significance of p < 0.05. Conclusion: The statistical non-linear regression approaches presented in this study provide detailed biologically oriented descriptions of individual gene expression profiles, using biologically variable data to generate a set of defining parameters. These approaches have application to the modelling and greater interpretation of profiles obtained across a wide range of platforms, such as microarrays. Through careful choice of appropriate model forms, such statistical regression approaches allow an improved comparison of gene expression profiles, and may provide an approach for the greater understanding of common regulatory mechanisms between genes.

A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell

Cell, 2007

The environment significantly influences the dynamic expression and assembly of all components encoded in the genome of an organism into functional biological networks. We have constructed a model for this process in Halobacterium salinarum NRC-1 through the datadriven discovery of regulatory and functional interrelationships among $80% of its genes and key abiotic factors in its hypersaline environment. Using relative changes in 72 transcription factors and 9 environmental factors (EFs) this model accurately predicts dynamic transcriptional responses of all these genes in 147 newly collected experiments representing completely novel genetic backgrounds and environments-suggesting a remarkable degree of network completeness. Using this model we have constructed and tested hypotheses critical to this organism's interaction with its changing hypersaline environment. This study supports the claim that the high degree of connectivity within biological and EF networks will enable the construction of similar models for any organism from relatively modest numbers of experiments.

Prediction of temporal gene expression

European Journal of Biochemistry, 2002

A computational approach is used to analyse temporal gene expression in the context of metabolic regulation. It is based on the assumption that cells developed optimal adaptation strategies to changing environmental conditions. Timedependent enzyme profiles are calculated which optimize the function of a metabolic pathway under the constraint of limited total enzyme amount. For linear model pathways it is shown that wave-like enzyme profiles are optimal for a rapid substrate turnover. For the central metabolism of yeast cells enzyme profiles are calculated which ensure long-term homeostasis of key metabolites under conditions of a diauxic shift. These enzyme profiles are in close correlation with observed gene expression data. Our results demonstrate that optimality principles help to rationalize observed gene expression profiles.

General properties of transcriptional time series in Escherichia coli

Nature Genetics, 2011

A gene's activity can be described by the discrete time series of mRNA production events 1,2 . This transcriptional time series is stochastic rather than deterministic 2-4 . Furthermore, it generally cannot be described as a simple Poisson process. In other words, mRNA molecules are not produced with a constant probability per unit time; instead, mRNA production is often bursty (pulsatile) in both bacteria 2 and higher organisms 4-8 . A suitable mathematical framework for describing gene activity data is the two-state model 8-10 , where a gene stochastically fluctuates between 'off ' and 'on' states, and mRNA is produced stochastically only in the on state. This scenario can lead to the occurrence of transcription 'bursts' , periods of intense activity separated by periods of quiescence. Measured mRNA kinetics 2,5 and copy-number statistics have been shown to be consistent with the two-state picture in a variety of model systems. However, despite considerable theoretical attention 2,13-17 , we do not have a biophysical understanding of the nature of the on and off states and what governs the transitions between them.

Modeling classic attenuation regulation of gene expression in bacteria

Journal of bioinformatics and computational biology, 2007

A model is proposed primarily for the classical RNA attenuation regulation of gene expression through premature transcription termination. The model is based on the concept of the RNA secondary structure macrostate within the regulatory region between the ribosome and RNA-polymerase, on hypothetical equation describing deceleration of RNA-polymerase by a macrostate and on views of transcription and translation initiation and elongation, under different values of the four basic model parameters which were varied. A special effort was made to select adequate model parameters. We first discuss kinetics of RNA folding and define the concept of the macrostate as a specific parentheses structure used to construct a conventional set of hairpins. The originally developed software that realizes the proposed model offers functionality to fully model RNA secondary folding kinetics. Its performance is compared to that of a public server described in Ref. 1. We then describe the delay in RNA-pol...

Dynamic Transcriptional Control of Gene Expressions

2013

Background. Cellular signaling involves a sequence of events from ligand binding to membrane receptors through transcription factors activation and the induction of mRNA expression. The transcriptional-regulatory system plays a pivotal role in the control of gene expression. A novel computational approach to the study of gene regulation circuits is presented here. Methodology. Based on the concept of finite state machine, which provides a discrete view of gene regulation, a novel sequential logic model (SLM) is developed to decipher control mechanisms of dynamic transcriptional regulation of gene expressions. The SLM technique is also used to systematically analyze the dynamic function of transcriptional inputs, the dependency and cooperativity, such as synergy effect, among the binding sites with respect to when, how much and how fast the gene of interest is expressed. Principal Findings. SLM is verified by a set of well studied expression data on endo16 of Strongylocentrotus purpuratus (sea urchin) during the embryonic midgut development. A dynamic regulatory mechanism for endo16 expression controlled by three binding sites, UI, R and Otx is identified and demonstrated to be consistent with experimental findings. Furthermore, we show that during transition from specification to differentiation in wild type endo16 expression profile, SLM reveals three binary activities are not sufficient to explain the transcriptional regulation of endo16 expression and additional activities of binding sites are required. Further analyses suggest detailed mechanism of R switch activity where indirect dependency occurs in between UI activity and R switch during specification to differentiation stage. Conclusions/Significance. The sequential logic formalism allows for a simplification of regulation network dynamics going from a continuous to a discrete representation of gene activation in time. In effect our SLM is non-parametric and modelindependent, yet providing rich biological insight. The demonstration of the efficacy of this approach in endo16 is a promising step for further application of the proposed method.