Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors - PubMed (original) (raw)

Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors

Michael F Berger et al. Nat Protoc. 2009.

Abstract

Protein-binding microarray (PBM) technology provides a rapid, high-throughput means of characterizing the in vitro DNA-binding specificities of transcription factors (TFs). Using high-density, custom-designed microarrays containing all 10-mer sequence variants, one can obtain comprehensive binding-site measurements for any TF, regardless of its structural class or species of origin. Here, we present a protocol for the examination and analysis of TF-binding specificities at high resolution using such 'all 10-mer' universal PBMs. This procedure involves double-stranding a commercially synthesized DNA oligonucleotide array, binding a TF directly to the double-stranded DNA microarray and labeling the protein-bound microarray with a fluorophore-conjugated antibody. We describe how to computationally extract the relative binding preferences of the examined TF for all possible contiguous and gapped 8-mers over the full range of affinities, from highest affinity sites to nonspecific sites. Multiple proteins can be tested in parallel in separate chambers on a single microarray, enabling the processing of a dozen or more TFs in a single day.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS STATEMENTS

The authors declare competing financial interests (see the HTML version of this article for details).

Figures

Figure 1

Figure 1

Schematic of universal PBM experiments. A commercially synthesized single-stranded DNA microarray (a) is double-stranded by solid phase primer extension (b) using a small amount of spiked-in fluorescently labeled dUTP. An epitope-tagged transcription factor is bound directly to the DNA on the microarray (c), and the protein-bound array is labeled with a fluorophore-conjugated antibody (d).

Figure 2

Figure 2

Sequence coverage and redundancy in the ‘all 10-mer’ universal PBM design. (a) Each microarray contains four identical subgrids consisting of approximately 44,000 probes. Every possible 8-mer occurs on at least 16 probes distributed across the subgrid, each time embedded in a different flanking sequence. (For every non-palindromic 8-mer, its reverse complement occurs on a separate set of 16 probes.) Probes containing the 8-mer CATGGAAA are shown as an example. The common primer sequence at the 3′ end is not shown. (b) All possible gapped 8-mers spanning up to 12 total positions are also covered at least 16 times, as shown for the gapped 8-mer CAnTnGnGAAnA.

Figure 3

Figure 3

Zoom-in of a universal PBM scan. (a) Region of a single subgrid, consisting of just over 1% of the total slide area, scanned to detect relative DNA amounts, as indicated by Cy3-labeled dUTP. (b) The same region of the same microarray, scanned with a different laser to detect protein binding, as indicated by Alexa 488-labeled anti-GST antibody. Intensities are shown in false-color, with white indicating saturated signal intensity, yellow indicating high signal intensity, green indicating moderate signal intensity, and blue indicating low signal intensity.

Figure 4

Figure 4

Correlation between observed and expected Cy3 probe intensities. Expected intensities were determined from sequence, based on the calculated regression coefficients for all trinucleotides.

Figure 5

Figure 5

Word-by-word and PWM representations of binding specificity. (a) Scores for individual _k_-mers. The top-scoring 8-mers for a PBM experiment using the mouse TF Six6 (ref. 17) are shown with their corresponding median signal intensities and enrichment scores. The ‘median normalized signal intensity’ represents the set of ~32 probes containing a match to each 8-mer. ‘E-score’ refers to the enrichment score described in the text. (b) Overview of our Seed-and-Wobble method for motif construction. The top-scoring 8-mer is used as a seed, and the relative preference of each nucleotide variant is systematically tested at each position within and outside the seed. These nucleotide E-scores are converted to probabilities using a Boltzmann distribution and displayed as a sequence logo.

Figure 6

Figure 6

Schematic of Agilent SureHyb hybridization chamber for protein binding reactions. The gasket cover slide, protein binding mixture, and microarray are sandwiched between both halves of the steel hybridization chamber. A four-chambered cover slide is used for the protein binding and antibody labeling incubations, whereas a single-chambered cover slide is used for primer extension. This figure is not drawn to scale.

Figure 7

Figure 7

Replicate scans at multiple laser power settings for integration by masliner. The same portion of the same microarray is displayed for three scans at varying laser power settings. (The color scheme is the same as for Figure 3.) The dimmest scan (right) can be used to resolve relative differences in signal intensity for spots with saturated intensities in the brightest scan (left), while the brightest scan provides above-background signal intensities for spots with low signal intensity.

Figure 8

Figure 8

Correlation in 8-mer enrichment scores obtained from replicate experiments. (a) Scatter plot comparing 8-mer scores from two PBM experiments using the mouse TF Tcf1 (ref. 17) performed on microarrays of the same ‘all 10-mer’ design. (b) Scatter plot comparing 8-mer scores from two PBM experiments using Tcf1 performed on microarrays of complementary ‘all 10-mer’ designs. E-scores for significantly-bound 8-mers are consistent among all replicate experiments. (c) Sequence logos representing PWMs derived for each data set.

Figure 9

Figure 9

Differences in _k_-mer binding profiles for highly similar TFs. (a) Sequence logos representing nearly identical PWMs derived from PBM experiments for the two mouse homeodomain TFs, Lhx2 and Lhx4. (b) Scatter plot comparing 8-mer scores for these same TFs. 8-mers containing each 6-mer sequence (inset) are highlighted, revealing clear, systematic differences in the sequence preferences of these TFs for lower affinity 8-mers despite identical preferences for the same highest affinity 8-mers (containing TAATTA). This figure has been adapted with permission from ref. .

References

    1. Ho SW, Jona G, Chen CT, Johnston M, Snyder M. Linking DNA-binding proteins to their recognition sequences by using protein microarrays. Proc Natl Acad Sci U S A. 2006;103:9940–9945. - PMC - PubMed
    1. Reece-Hoyes JS, et al. A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol. 2005;6:R110. - PMC - PubMed
    1. Adryan B, Teichmann SA. FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster. Bioinformatics. 2006;22:1532–1533. - PubMed
    1. Gray PA, et al. Mouse brain organization revealed through direct genome-scale TF expression analysis. Science. 2004;306:2255–2257. - PubMed
    1. Messina DN, Glasscock J, Gish W, Lovett M. An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 2004;14:2041–2047. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources