Position specific variation in the rate of evolution in transcription factor binding sites - PubMed (original) (raw)

Position specific variation in the rate of evolution in transcription factor binding sites

Alan M Moses et al. BMC Evol Biol. 2003.

Abstract

Background: The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution.

Results: Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms.

Conclusion: As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Characterized binding sites evolve more slowly than the promoters in which they are found. A. Histogram of the rate of evolution (estimated by maximum parsimony) in characterized Gal4p binding sites and randomly chosen sequences of the same length (17 basepairs) from the same promoters. B. Differences in the mean rate of evolution in motifs and the mean rate in the promoters in which they are found. Grey boxes represent the average in binding sites; unfilled boxes represent the average over the promoters in which the motifs are found (see methods). Error bars represent exact 95 % confidence intervals for a Poisson distribution.

Figure 2

Figure 2

Comparison of rates of evolution to structures of protein-DNA complexes implies a model for the variation in the rate of evolution across binding motifs. The DNA backbone appears as a red helix; proteins appear as linked coloured cylinders. We propose that the formation of the protein-DNA complex is the functional constraint that leads to purifying selection, and therefore fewer substitutions at certain positions in the binding motif. Images of protein-DNA complex structures are from the Protein Data Bank [47]. Rate of evolution is in substitutions per site (estimated by maximum parsimony) and error bars represent exact 95 % confidence intervals for a Poisson distribution.

Figure 3

Figure 3

Association between information profile and rate of evolution in characterized binding sites from SCPD. A–D. Representative plots of information content and substitutions per site reveal a correspondence between positions of high information content and slower rates of evolution. Open symbols represent information content and filled symbols the number of substitutions per site (estimated by maximum parsimony). Consensus letters are included below the appropriate positions in the motif.

Figure 4

Figure 4

Test of the Halpern-Bruno proportionality. Observed rate of evolution versus the predictions based on the nucleotide frequencies in the binding motif in S. cerevisiae. Each point represents the predicted and observed rates at a given position in a motif. For each factor the proportionality has been normalized by the total number of substitutions observed in the corresponding binding sites. See text for details.

Figure 5

Figure 5

Information and rate of evolution for the recently reported Crz1p motif. This motif shows the characteristic pattern of evolution observed for real motifs. Open symbols represent information content and filled symbols, the number of substitutions per site (estimated by maximum parsimony.) Consensus letters are included below the appropriate positions in the motif.

References

    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. doi: 10.1093/bioinformatics/16.1.16. - DOI - PubMed
    1. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993;262:208–214. - PubMed
    1. Bailey TL, Elkan C. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California; 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers; pp. 28–36. - PubMed
    1. Eskin E, Pevzner PA. Finding composite regulatory patterns in DNA sequences. Bioinformatics. 2002;18:S354–363. - PubMed
    1. Liu XS, Brutlag DL, Liu JS. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol. 2002;20:835–839. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources