Extensive low-affinity transcriptional interactions in the yeast genome - PubMed (original) (raw)

Extensive low-affinity transcriptional interactions in the yeast genome

Amos Tanay. Genome Res. 2006 Aug.

Abstract

Major experimental and computational efforts are targeted at the characterization of transcriptional networks on a genomic scale. The ultimate goal of many of these studies is to construct networks associating transcription factors with genes via well-defined binding sites. Weaker regulatory interactions other than those occurring at high-affinity binding sites are largely ignored and are not well understood. Here I show that low-affinity interactions are abundant in vivo and quantifiable from current high-throughput ChIP experiments. I develop algorithms that predict DNA-binding energies from sequences and ChIP data across a wide dynamic range of affinities and use them to reveal widespread functionality of low-affinity transcription factor binding. Evolutionary analysis suggests that binding energies of many transcription factors are conserved even in promoters lacking classical binding sites. Gene expression analysis shows that such promoters can generate significant expression. I estimate that while only a small percentage of the genome is strongly regulated by a typical transcription factor, up to an order of magnitude more may be involved in weaker interactions. Low-affinity transcription factor-DNA interaction may therefore be important both evolutionarily and functionally.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

The transcriptional program in yeast: digital or analog? According to the prevalent “digital” hypothesis for transcriptional regulation (A), complex regulatory programs are described using wiring diagrams that associate TFs to genes deterministically. In the alternative “analog” model (B), many TFs may affect each gene at drastically different levels of specificity. Two-way clustering of 200 ChIP binding profiles and 6000 yeast genes (C) reveals groups of genes with remarkably similar binding ratios in all 200 ChIP experiments. Few of the entries in the homogeneous submatrices represent high-specificity TF–gene associations. The clusters and their association with biological functions (Supplemental Table 1) suggest that ChIP experiments may reflect complex and functionally meaningful organization of low-affinity TF–gene interactions.

Figure 2.

Figure 2.

Quantitative ChIP to sequence correlation. (A) ChIP and PWM correlation above and below a _P_-value threshold. Shown are log _P_-values of the Spearman correlation between ChIP binding ratios and PWM energy predictions (_y_-axis). Using a range of possible thresholds (_x_-axis), correlations were computed separately for genes with ChIP values below (red) and above (black) the threshold. In all cases, a significant correlation is observed in both sets of genes, and for all selections of thresholds. (B) Sequence–ChIP correlation reveals in vivo low-specificity binding. Shown are averages and cumulative probability distributions (CPDs) of PWM binding energies for groups of genes with ChIP values within certain intervals. Remarkable monotonicity is observed in all cases, with predicted energies of groups with higher-significance _P_-values (left) consistently higher than those of groups with less-significant _P_-values (right). The monotonicity is holding for very low specificity ranges, suggesting ChIP profiles are informative over a wide dynamic range of specificities.

Figure 3.

Figure 3.

Motif regression reveals known and novel binding sites. (A) The PREGO algorithm. The PREGO algorithm was developed to fit PWM models to raw ChIP-on-chip profiles. The algorithm combines ChIP and sequence data and builds PWM models with optimal prediction accuracy over the entire affinity spectrum. (B) Robustness of PWM energy predictions. Applying the PREGO algorithm independently to individual experiments demonstrates the robustness of the derived energy models. Shown here is the correlation between two Aft2 experiments (left), the two PWM models derived from them (middle), and the correlation of the energy predictions for these two PWMs. The remarkable reproducibility suggests that PREGO-derived PWMs may be used quantitatively. (C) Using low-affinity promoters improves motif-finding sensitivity. Shown are examples of PWMs inferred by the PREGO algorithm from ChIP profiles in which the motif-finding approach failed to find motifs. All the cases shown are confirmed by additional evidence from the literature. See Methods for definition of the PWMs score. “Models _r_s” represents the Spearman correlation of energy predictions from PWMs generated using two different arrays. “Data _r_s” represents the Spearman correlation of the two raw ChIP profiles used to construct the two PWMs.

Figure 4.

Figure 4.

Testing the digital model. (A) Normalizing ChIP data. PREGO performs internal normalization of the ChIP data to eliminate any correlation of the binding ratios to single or dinucleotide composition or to low complexity sequences [typically poly(A) or poly(T) tracts]. Shown are the scatter and trend of the raw Mbp1 ChIP binding ratio versus the inferred correction, involving contribution from several dinucleotides and an AAAA/TTTT motif. The Spearman correlation of each of the sequence features used in the normalization and the ChIP data is also shown (right). (B,C) Discrete versus analog models. If TF–gene interactions can be reasonably approximated as either occurring or not occurring (hits or non-hits), then the joint distribution of ChIP and PWM predictions should reflect zero covariance inside such two ideal subsets of the genome (left). If ChIP and PWM provide quantitative estimations on in vivo binding affinity, then no partition of the genome can eliminate their correlation (right). It is therefore possible to test the validity of the digital assumption by fitting two distributions to the data and analyzing their parameters. (D) ChIP-sequence correlation reflects an analog behavior. Analysis of the ChIP/PWM joint distributions for three TFs reveals that their quantitative correlation cannot be explained as a consequence of the mixture of two distributions (Methods). Shown are inferred maximum likelihood distributions for hits (darker) and non-hits (brighter). The mixture coefficients (ρ) and correlation coefficients (r) are indicated. The analysis suggests that about one-fifth of the genome is influenced by each of the TFs, and that for at least one-fifth of the genome, ChIP- and sequence-based estimations of affinity are correlated in a quantitative fashion.

Figure 5.

Figure 5.

Evolutionary conservation of predicted binding energies. Plotted are the conservation scores of genes with low (left) to high (right) TF-binding energies. (_x_-axis) S. cerevisiae binding energy percentile. (_y_-axis) Conservation score (Methods). In all cases, the binding energies of higher-affinity promoters are conserved. For several of the TFs, conservation is observed on a significant fraction of the genome (10%–20%), reflecting widespread selection on the binding energy of promoters lacking high-affinity binding sites.

Figure 6.

Figure 6.

Low-affinity promoters generate gene expression. Shown is the gene expression generated by promoters with low (left) to high (right) predicted TF-binding energies. (_x_-axis) Percentile of predicted TF-binding energy. (_y_-axis) Median of log fold expression changes in bins of 5 affinity percentiles. The experimental condition is different for each plot and is noted on the graph. Bins that represent significant up- or down-regulation (Methods) are labeled in circles. The plots suggest that some TFs (e.g., Gcn4, Mbp1) may weakly affect the expression of a substantial number of genes even when clear binding sites are lacking.

Similar articles

Cited by

References

    1. Bintu L., Buchler N.E., Garcia H.G., Gerland U., Hwa T., Kondev J., Phillips R., Buchler N.E., Garcia H.G., Gerland U., Hwa T., Kondev J., Phillips R., Garcia H.G., Gerland U., Hwa T., Kondev J., Phillips R., Gerland U., Hwa T., Kondev J., Phillips R., Hwa T., Kondev J., Phillips R., Kondev J., Phillips R., Phillips R. Transcriptional regulation by the numbers: Models. Curr. Opin. Genet. Dev. 2005;15:116–124. - PMC - PubMed
    1. Brown C.T., Callan C.G., Jr., Callan C.G., Jr. Evolutionary comparisons suggest many novel cAMP response protein binding sites in Escherichia coli. Proc. Natl. Acad. Sci. 2004;101:2404–2409. - PMC - PubMed
    1. Bussemaker H.J., Li H., Siggia E.D., Li H., Siggia E.D., Siggia E.D. Regulatory element detection using correlation with expression. Nat. Genet. 2001;27:167–171. - PubMed
    1. Cawley S., Bekiranov S., Ng H.H., Kapranov P., Sekinger E.A., Kampa D., Piccolboni A., Sementchenko V., Cheng J., Williams A.J., Bekiranov S., Ng H.H., Kapranov P., Sekinger E.A., Kampa D., Piccolboni A., Sementchenko V., Cheng J., Williams A.J., Ng H.H., Kapranov P., Sekinger E.A., Kampa D., Piccolboni A., Sementchenko V., Cheng J., Williams A.J., Kapranov P., Sekinger E.A., Kampa D., Piccolboni A., Sementchenko V., Cheng J., Williams A.J., Sekinger E.A., Kampa D., Piccolboni A., Sementchenko V., Cheng J., Williams A.J., Kampa D., Piccolboni A., Sementchenko V., Cheng J., Williams A.J., Piccolboni A., Sementchenko V., Cheng J., Williams A.J., Sementchenko V., Cheng J., Williams A.J., Cheng J., Williams A.J., Williams A.J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004;116:499–509. - PubMed
    1. Cliften P., Sudarsanam P., Desikan A., Fulton L., Fulton B., Majors J., Waterston R., Cohen B.A., Johnston M., Sudarsanam P., Desikan A., Fulton L., Fulton B., Majors J., Waterston R., Cohen B.A., Johnston M., Desikan A., Fulton L., Fulton B., Majors J., Waterston R., Cohen B.A., Johnston M., Fulton L., Fulton B., Majors J., Waterston R., Cohen B.A., Johnston M., Fulton B., Majors J., Waterston R., Cohen B.A., Johnston M., Majors J., Waterston R., Cohen B.A., Johnston M., Waterston R., Cohen B.A., Johnston M., Cohen B.A., Johnston M., Johnston M. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science. 2003;301:71–76. - PubMed

MeSH terms

Substances

LinkOut - more resources