From DNA sequence to transcriptional behaviour: a quantitative approach - PubMed (original) (raw)

Review

From DNA sequence to transcriptional behaviour: a quantitative approach

Eran Segal et al. Nat Rev Genet. 2009 Jul.

Abstract

Complex transcriptional behaviours are encoded in the DNA sequences of gene regulatory regions. Advances in our understanding of these behaviours have been recently gained through quantitative models that describe how molecules such as transcription factors and nucleosomes interact with genomic sequences. An emerging view is that every regulatory sequence is associated with a unique binding affinity landscape for each molecule and, consequently, with a unique set of molecule-binding configurations and transcriptional outputs. We present a quantitative framework based on existing methods that unifies these ideas. This framework explains many experimental observations regarding the binding patterns of factors and nucleosomes and the dynamics of transcriptional activation. It can also be used to model more complex phenomena such as transcriptional noise and the evolution of transcriptional regulation.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Overview of quantitative models for computing expression from DNA sequences

Flow diagram of the computational approach, for a simplified regulatory sequence, with nucleosomes and one transcription factor as the input binding molecules. Each of the input molecules has intrinsic binding affinities for every possible sequence of length k (top panels, left and right), where k is the number of basepairs recognized by the binding molecule. These intrinsic molecule affinities dictate how every DNA sequence is ‘translated’ into a unique binding affinity landscape for each molecule along the sequence (top panel, centre). For each factor concentration (bottom panel, left), the model uses these binding affinity landscapes to compute a probability distribution over configurations of bound molecules (see Box 1 for details); a small subset of these configurations are illustrated (bottom panel, centre). Configurations in which two bound molecules overlap are not allowed due to steric hindrance constraints, thereby modeling binding competition between molecules (see bottom-most configuration with probability 0). Finally, each configuration results in a particular transcriptional output (bottom panel, right); the final expression is then the sum of the expression contribution of each configuration, weighted by their probability.

Figure 2

Figure 2. Main determinants of in vivo nucleosome organization

(a) Shown is the nucleosome occupancy in vivo in yeast (blue) and the nucleosome affinity landscape as measured in vitro by assembling purified histones on purified yeast genomic DNA (green), averaged across all genes. Occupancy around gene transcription start sites is shown on the left, and around gene translation end sites on the right. Also shown below each graph is a schematic illustration of the key components that contribute to the in vivo nucleosome occupancy. Nucleosome depletion around gene ends is largely encoded by the nucleosome affinity landscape, while nucleosome depletion around gene starts results both from the encoded nucleosome affinity landscape and from the binding action of transcription factors. (b) Across one genomic region from worm with well-positioned nucleosomes, shown is the average nucleosome occupancy for that region in vivo (blue) and the average nucleosome affinity landscape for that region as predicted by a model constructed from in vitro data in yeast (green). (c) Same as (b), across a genomic region from worm with less-well defined nucleosome locations (“fuzzy nucleosomes”). The agreement between predictions of a model based on nucleosome sequence preferences and the experimental measurements, both at regions with well-positioned nucleosomes (b) and at regions with fuzzy nucleosomes (c), suggests that both types of regions may be encoded by the genomic sequence, through peaked nucleosome affinity landscapes (b) or relatively flat landscapes (c). (d) Nucleosome-disfavoring sequences can have a long-range effect on the nucleosome organization. This example sequence contains a strong nucleosome disfavoring sequence (yellow diamond), which are highly abundant in eukaryotic genomes. When such a nucleosome-affinity landscape is combined with a high nucleosome concentration, as is the case in vivo, the bound nucleosomes automatically organize into ordered arrays, whose order decays with the distance from the original disfavoring sequence (bottom graph and schematic bottom sequence). This phenomenon is termed ‘statistical positioning’. (e) Illustration of how a single sequence may potentially encode for different nucleosome organizations in different cell types or biological conditions, by encoding different outcomes of nucleosome-factor competition at different factor concentrations. Shown is a sequence having a uniform landscape for nucleosomes and a landscape for one factor that includes a single strong binding site. In condition 1, where the hypothetical factor is expressed at low levels, the most likely configurations have nucleosomes covering the factor binding site, whereas in condition 2, where the factor is expressed at high levels, the most likely configurations have the factor binding to its site, causing a displacement of nucleosomes from their cognate sites.

Figure 3

Figure 3. Reading gene expression dynamics from DNA sequence

**(a)**Nucleosomes act as general repressors. Shown are two example sequences with a transcription factor landscape containing a single binding site, and with either a uniform but moderate-affinity landscape for nucleosomes (sequence ‘1’) or a uniform but low-affinity landscape for nucleosomes (sequence ‘2’). (b) For the two sequences from (a), shown is the probability of transcription factor binding at different factor concentrations, computed by applying the framework presented here to the binding landscapes of those two sequences. (c) Nucleosome disfavoring sequences determine the threshold of activation. Shown are three example sequences with differing nucleosome and factor landscapes: (‘1’) a uniform nucleosome landscape; (‘2’) a landscape with a sequence that strongly disfavors nucleosome formation, located 10bp from the single transcription factor site (‘2’); (‘3’) same as ‘2’, but where the disfavoring sequence is located 135bp from the factor site. (d) The probability of transcription factor binding at each of the three sequences from panel c. (e) For each of the three sequences from (c), shown is the most likely molecule binding configuration at three different factor concentrations (c). (f) Proximal factor sites exhibit cooperative or destructive binding. Shown are three example sequences with a uniform nucleosome affinity landscape and differing factor landscapes: (‘1’) a single factor site; (‘2’) two factor sites separated by 10bp; (‘3’) two factor sites separated by 135bp. (g) The probability of transcription factor binding to the left (red) site at each of the three sequences from (f). (h) Shown are the cooperative and destructive binding effects in sequences ‘2’ and ‘3’, respectively, displayed as the ratio between the factor binding probability at sequence ‘2’ or ‘3’ compared to sequence ‘1’.

Figure 4

Figure 4. Distinct modes of transcriptional regulation encoded by DNA sequence

**(a)**Two sets of yeast genes were defined based on their DNA sequence: one set by the absence of strong nucleosome disfavoring sequences and the presence of TATA sequences (left), and one by the presence of strong nucleosome disfavoring sequences and the absence of TATA sequences (right). Shown is the nucleosome occupancy in vivo (blue) and the nucleosome affinity landscape as measured in vitro by assembling purified histones on purified yeast genomic DNA (green), averaged across all genes of each gene set. Also shown is the approximate affinity landscape for all transcription factors across all genes of each of the two gene sets, using the spatial distribution of factor binding site occurrences as a proxy for the spatial distribution of affinity. (b) Schematic illustration of the most likely configurations of each gene set. In gene set one (left), the nucleosome landscape exhibits high nucleosome occupancy and the transcription factor landscape has a relatively large number of binding sites spread across the regulatory region, suggesting that nucleosomes and factors are in competition for access to the DNA. Supporting this suggestion is the high transcriptional noise, high rate of histone turnover, and enrichment for chromatin remodeler activity, that were found for this gene set. In contrast, in gene set two (right), the nucleosome landscape shows strong nucleosome depletion around the transcription start site, and the factor landscape has fewer binding sites, but with a preference for these sites to be located at the nucleosome depleted region. These landscapes suggest little competition between factors and nucleosomes, and supporting this is the low noise, low histone turnover, and absence of enrichment for chromatin remodeler targets, that were found for this gene set.

Figure 5

Figure 5. Explaining transcriptional noise from DNA sequence

**(a)**Nucleosome disfavoring sequences determine the range of factor concentrations at which high transcriptional noise occurs. Shown are two example sequences, one with a uniform nucleosome landscape (‘1’), and one with a nucleosome landscape containing a sequence that strongly disfavors nucleosome formation, located 10bp from the single transcription factor site (‘2’). (b) For the two sequences from (a), shown is the probability of transcription factor binding at different factor concentrations, computed by applying the framework presented here to the binding landscapes of those two sequences. Under this equilibrium framework, the regime of high transcriptional noise is where the probability of transcription factor binding is ~0.5 (highlighted by the brown rectangle), since at this regime the variance of factor binding is maximal. Note, however, that while the variance of factor binding is one of the determinants of noise levels, other determinants exist as well. (c) For four different factor concentrations (c), shown are the most likely molecule binding configurations at each of the two sequences from (a). Note that at each of the two intermediate concentrations, one of the two sequences is noisy, i.e., the configurations in which the factor is bound and the configurations in which the factor is not bound have near-equal probability. (d) Cooperative binding reduces the range of factor concentrations at which there is high transcriptional noise. Shown are two example sequences with a uniform nucleosome landscape, where one sequence has a single factor site (‘1’), and the other two factor sites separated by 10bp (‘2’). (e) The probability of transcription factor binding to the left (red) site at each of the two sequences from (d). The regime of high noise is highlighted (brown rectangle). The range of factor concentrations at which each sequence exhibits high noise is depicted. The range of factor concentrations in which sequence ‘2’ (the sequence with cooperative binding) is noisy is smaller than the corresponding range for sequence ‘1’.

Similar articles

Cited by

References

    1. Casadaban MJ. Transposition and fusion of the lac genes to selected promoters in Escherichia coli using bacteriophage lambda and Mu. J Mol Biol. 1976;104:541–555. doi:0022-2836(76)90119-4 [pii] - PubMed
    1. Guarente L, Ptashne M. Fusion of Escherichia coli lacZ to the cytochrome c gene of Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 1981;78:2199–2203. - PMC - PubMed
    1. Bellen HJ, et al. P-element-mediated enhancer detection: a versatile method to study development in Drosophila. Genes Dev. 1989;3:1288–1300. - PubMed
    1. Wilson MD, et al. Species-specific transcription in mice carrying human chromosome 21. Science. 2008;322:434–438. - PMC - PubMed
    1. Rajewsky N, Vergassola M, Gaul U, Siggia ED. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics. 2002;3:30. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources