High-resolution DNA-binding specificity analysis of yeast transcription factors - PubMed (original) (raw)

doi: 10.1101/gr.090233.108. Epub 2009 Jan 21.

Kelsey J R P Byers, Rachel Patton McCord, Zhenwei Shi, Michael F Berger, Daniel E Newburger, Katrina Saulrieta, Zachary Smith, Mita V Shah, Mathangi Radhakrishnan, Anthony A Philippakis, Yanhui Hu, Federico De Masi, Marcin Pacek, Andreas Rolfs, Tal Murthy, Joshua Labaer, Martha L Bulyk

Affiliations

High-resolution DNA-binding specificity analysis of yeast transcription factors

Cong Zhu et al. Genome Res. 2009 Apr.

Abstract

Transcription factors (TFs) regulate the expression of genes through sequence-specific interactions with DNA-binding sites. However, despite recent progress in identifying in vivo TF binding sites by microarray readout of chromatin immunoprecipitation (ChIP-chip), nearly half of all known yeast TFs are of unknown DNA-binding specificities, and many additional predicted TFs remain uncharacterized. To address these gaps in our knowledge of yeast TFs and their cis regulatory sequences, we have determined high-resolution binding profiles for 89 known and predicted yeast TFs, over more than 2.3 million gapped and ungapped 8-bp sequences ("k-mers"). We report 50 new or significantly different direct DNA-binding site motifs for yeast DNA-binding proteins and motifs for eight proteins for which only a consensus sequence was previously known; in total, this corresponds to over a 50% increase in the number of yeast DNA-binding proteins with experimentally determined DNA-binding specificities. Among other novel regulators, we discovered proteins that bind the PAC (Polymerase A and C) motif (GATGAG) and regulate ribosomal RNA (rRNA) transcription and processing, core cellular processes that are constituent to ribosome biogenesis. In contrast to earlier data types, these comprehensive k-mer binding data permit us to consider the regulatory potential of genomic sequence at the individual word level. These k-mer data allowed us to reannotate in vivo TF binding targets as direct or indirect and to examine TFs' potential effects on gene expression in approximately 1,700 environmental and cellular conditions. These approaches could be adapted to identify TFs and cis regulatory elements in higher eukaryotes.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

PBM characterization of S. cerevisiae TF DNA-binding specificities. (A) Hierarchical clustering of PBM data over ungapped 8-mer _E_-scores determined for 89 yeast TFs. (B) Sequence logos for selected examples of newly discovered yeast TF DNA-binding site motifs.

Figure 2.

Figure 2.

PBM _k_-mer binding profiles in most cases correspond well with ChIP-chip binding data. (A) For 33 of the 40 TFs for which we had both PBM- and ChIP-chip-derived motifs (Harbison et al. 2004), the PBM _k_-mer-derived potential targets were significantly enriched (AUC > 0.5, P < 0.05) among the ChIP-chip “bound” regions, showing good agreement between the ChIP-chip in vivo data and our scoring of genes based on the in vitro PBM _k_-mer data. (_B_) For 11 out of 40 TFs, intergenic regions scored by the PBM 8-mer data are more highly enriched (>5% improvement in AUC; all PBM AUC _P_-values are <0.05) among the ChIP-chip “bound” regions as compared with those scored by the ChIP-chip-derived motif.

Figure 3.

Figure 3.

Reclassification of TF occupancy at ChIP-chip “bound” (P < 0.001) intergenic regions as likely being due to direct DNA-binding sites versus indirect association of the TF with the DNA. Blue bars _above_ the horizontal axis for each TF indicate the number of ChIP-chip bound intergenic regions that were previously called “indirect” (i.e., the regions do not contain a good match to the ChIP-chip motif as determined by MacIsaac et al. (2006) that are reclassified as potential “direct” TF targets by PBM data (i.e., the regions contain a PBM _k_-mer with an _E_-score > 0.45). Red bars below the axis indicate the number of intergenic regions previously annotated as “direct” targets by MacIsaac et al. (2006) that are reclassified as potential sites of indirect TF association according to the PBM data (i.e., the regions do not contain any _k_-mers with _E_-score > 0.45).

Figure 4.

Figure 4.

Pbf1 and Pbf2 regulate rRNA processing genes. (A) Predicted target genes of Pbf1 and Pbf2 are significantly repressed (CRACR P < 10−12) after 20 min heat shock (shift from 25°C to 37°C) in wild-type, Δpbf1, and Δpbf2 strains, but not in the Δpbf1Δpbf2 double deletion strain, in Affymetrix gene expression profiling of triplicate biological replicate cultures. (B) Box plots indicating expression changes of rRNA processing genes containing at least one _k_-mer at E ≥ 0.45 after 20 min heat shock in wild-type, Δpbf1, Δpbf2, and Δpbf1Δpbf2 strains, in the expression data from A. (C) Pbf1 and Pbf2 associate in vivo with the promoter regions of the rRNA processing genes SAS10, NOP2, MTR4, KRR1, and ERB1. ChIP-qPCR was performed on cells treated with 5-min heat shock, at predicted target sites in their upstream regions, and at a negative control region upstream of ENO2. Binding fold-enrichment was defined as the ratio of PCR product in “IP” versus “INPUT,” using an open reading frame free region on chromosome V as an internal normalization control. Error bars indicate 1 SD from triplicate biological replicate cultures (*P < 0.05; **P < 0.01; two-sided Student's _t_-test). (D) Expression ratio of rRNA processing genes after heat shock. RT-qPCR data were generated for either untreated yeast or yeast treated with 20-min heat shock. Gene expression was normalized relative to ACT1 as an internal normalization control. Error bars indicate 1 SD from triplicate biological replicate cultures (*P < 0.05; **P < 0.01; two-sided Student's _t_-test compared with wild type).

Figure 5.

Figure 5.

Analysis of TFs' regulatory associations and coregulatory factors. (A) Two-dimensional hierarchical clustering of 89 TFs (rows) according to their CRACR statistics across 1693 expression conditions (columns). (B) Examples of predicted coregulatory TFs from A with distinct motifs, and their 8-mer binding profile correlations. Clusters annotations are derived from the literature and functional predictions from this study. A high-resolution heatmap with full labeling is available in Supplemental Fig. S11, S12.

Similar articles

Cited by

References

    1. Angus-Hill M.L., Schlichter A., Roberts D., Erdjument-Bromage H., Tempst P., Cairns B.R. A Rsc3/Rsc30 zinc cluster dimer reveals novel roles for the chromatin remodeler RSC in gene expression and cell cycle control. Mol. Cell. 2001;7:741–751. - PubMed
    1. Aparicio O., Geisberg J., Sekinger E., Yang A., Moqtaderi Z., Struhl K. Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo. Curr. Protoc. Mol. Biol. 2005 doi: 10.1002/0471142727.mb2103s69. - DOI - PubMed
    1. Beer M.A., Tavazoie S. Predicting gene expression from sequence. Cell. 2004;117:185–198. - PubMed
    1. Berger M., Bulyk M. Universal protein binding microarrays for the comprehensive characterization of the DNA binding specificities of transcription factors. Nat. Protocols. 2009 (in press). - PMC - PubMed
    1. Berger M.F., Philippakis A.A., Qureshi A.M., He F.S., Estep P.W., III, Bulyk M.L. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 2006;24:1429–1435. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources