Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family (original) (raw)

Genes Dev. 2007 Aug 1; 21(15): 1882–1894.

Peter C. Hollenhorst

1 Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA;

Atul A. Shah

1 Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA;

Christopher Hopkins

2 Agilent Technologies, Santa Clara, California 95051, USA

Barbara J. Graves

1 Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA;

1 Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA;

2 Agilent Technologies, Santa Clara, California 95051, USA

Received 2007 Apr 16; Accepted 2007 Jun 14.

Copyright © 2007, Cold Spring Harbor Laboratory Press

Freely available online through the Genes & Development Open Access option.

Abstract

The conservation of in vitro DNA-binding properties within families of transcription factors presents a challenge for achieving in vivo specificity. To uncover the mechanisms regulating specificity within the ETS gene family, we have used chromatin immunoprecipitation coupled with genome-wide promoter microarrays to query the occupancy of three ETS proteins in a human T-cell line. Unexpectedly, redundant occupancy was frequently detected, while specific occupancy was less likely. Redundant binding correlated with housekeeping classes of genes, whereas specific binding examples represented more specialized genes. Bioinformatics approaches demonstrated that redundant binding correlated with consensus ETS-binding sequences near transcription start sites. In contrast, specific binding sites diverged dramatically from the consensus and were found further from transcription start sites. One route to specificity was found—a highly divergent binding site that facilitates ETS1 and RUNX1 cooperative DNA binding. The specific and redundant DNA-binding modes suggest two distinct roles for members of the ETS transcription factor family.

Keywords: ETS, transcription, gene families, cooperative binding, promoter specificity, ChIP–chip

Transcriptional activators and repressors bind regulatory sequences within promoters and enhancers by engaging sequence-specific DNA-binding sites. Many DNA-binding proteins with well-characterized DNA sequence preferences are now known. Furthermore, sequence motifs that match these preferences are found by bioinformatics-based queries of genomes. However, predicting the in vivo pattern of DNA–protein interactions based on in vitro-determined consensus and bioinformatics-based databases has proven difficult (Wasserman and Sandelin 2004). Genome-wide transcription factor occupancy techniques, such as chromatin immunoprecipitation (ChIP) coupled with DNA microarrays (termed ChIP–chip), can define transcription factor in vivo utilization of genomic sequences and inform bioinformatics approaches. However, most genome-wide occupancy studies in mammalian systems have failed to identify matches to in vitro consensus binding sequences in the majority of occupied regions (Weinmann et al. 2002; Li et al. 2003; Martone et al. 2003; Euskirchen et al. 2004; Krig et al. 2007). In addition, recognizable consensus sites can display no evidence of binding. Several speculations have been presented to explain this lack of correlation. Protein partnerships could facilitate the use of nonconsensus binding sites by DNA-binding cooperativity. Chromatin structure could occlude some sequences that are predicted to be binding sites. A combination of experimental and bioinformatics approaches are necessary to test these hypotheses at a genomic level and answer the central question: What is the genetic basis for the control of gene expression?

Large gene families in mammalian genomes that encode transcription factors with highly related DNA-binding properties (e.g., ETS, GATA, HOX, or FOX proteins, and nuclear hormone receptors) (Messina et al. 2004) present a further challenge. The dilemma is how transcription factors with overlapping DNA sequence preferences direct distinct transcriptional responses in vivo. The problem itself is poorly characterized because many binding sites in promoters and enhancers have been assayed only for an arbitrary subset of family members. Furthermore, no mammalian ChIP–chip experiments have directly addressed this family conundrum. Thus, it is unresolved whether extensive genetic redundancy is a characteristic of these families or whether robust mechanisms operate that drive specificity.

The ETS family of transcription factors provides an excellent system to pursue these questions due to the extensive knowledge of the biological roles of ETS genes and the biochemical properties of ETS proteins (Sharrocks 2001; Oikawa and Yamada 2003). The family is defined by the conserved DNA-binding domain, termed the ETS domain, which bears a winged helix–turn–helix protein fold. Phylogenetic analysis of the 27 human ETS domains identifies subfamilies of more highly related members, termed clades (Fig. 1A). The DNA-binding properties of ETS proteins from all clades are remarkably similar due to the high conservation of amino acids within the ETS domain that are critical for DNA interaction. For example, in vitro site-selection studies performed on 10 ETS proteins each report preference for an invariant GGA core. In addition, five flanking positions also show conservation among these family members (Fig. 1B).

An external file that holds a picture, illustration, etc. Object name is 1882fig1.jpg

Conservation of mammalian ETS domains. (A) Phylogram tree of human ETS domain sequences. The amino acid sequence of all 27 human ETS domains were aligned by Clustal W (Thompson et al. 1994). The horizontal branch lengths relate to predicted evolutionary distance. (Longer branches are more divergent.) Nine clades with multiple highly similar domains and three additional singlet domains, indicated as numbers 1–12. ETS genes expressed in Jurkat T cells with mRNA levels above one copy per cell are highlighted in yellow (Hollenhorst et al. 2004). (B) ETS domain consensus binding sites. Illustrated sites were selected in vitro by SELEX method for the indicated mouse ETS proteins (Brown and McKnight 1992; Nye et al. 1992; Mao et al. 1994; Ray-Gallet et al. 1995; Shore and Sharrocks 1995; John et al. 1996). Sequence logos were created from PWM by enoLOGOS (Workman et al. 2005) with the height of each base related to reported frequency at that position. The selected ETV1, ETV2, and GABPα consensus sites were reported (Brown and McKnight 1992) without the necessary frequency distribution data to build a PWM.

In spite of this high degree of conservation, experimental studies suggest that ETS proteins have unique biological functions. All 14 mouse ETS gene disruption strains, including seven targeting ubiquitously expressed ETS genes, show unique phenotypes (Hollenhorst et al. 2004; Zhou et al. 2005). In vivo transcription assays demonstrate functional differences with some of the ETS proteins being activators, whereas others are repressors (Kopp et al. 2004). Thus, the remarkable conservation of DNA binding is contrasted with expected diversity of biological function. Interestingly, this predicted specificity must exist in an environment in which multiple ETS proteins are present, because more than half of the 27 human ETS genes are expressed in any particular cell type (Galang et al. 2004; Hollenhorst et al. 2004). As with other gene families, the role of potential redundancy versus predicted specificity of the ETS family has not been rigorously tested.

For individual ETS proteins, distinct functional domains that lie outside of the ETS domain could facilitate specificity. For example, one mechanism to enhance DNA-binding specificity is protein–protein interactions that mediate cooperative binding at distinct DNA sequences. The ETS family has a few examples of this phenomenon. The TCF clade (ELK1, SAP1, NET) functions with the DNA-binding factor, SRF, via a protein interaction domain (Price et al. 1995; Buchwalter et al. 2004). GABPα partners with GABPβ, which mediates dimerization and formation of a GABPα/β hetero-tetramer that binds two ETS sites (de la Brousse et al. 1994). High-resolution molecular models of these complexes are available (Batchelor et al. 1998; Hassler and Richmond 2001; Mo et al. 2001); however, other partnerships are less well understood. For example, ETS1 could function with as many as nine different transcription factors (Li et al. 2000). Only RUNX1 (also known as AML1, CBFα2, PEBP2) has been demonstrated to mediate DNA-binding cooperativity with ETS1 and, thus, potentially enhance specificity (Goetz et al. 2000; Gu et al. 2000). None of the potential ETS protein partnerships have been assayed by ChIP or shown to limit in vivo occupancy of other ETS proteins. Thus, the in vivo use of protein partnerships or any other specificity mechanism remains poorly characterized.

The unique biological function of ETS proteins predicts the selection of specific transcriptional targets. However, few target genes are linked definitively to individual ETS family members. Most of the >200 putative target genes for ETS proteins have been queried only by transcription effects that required overexpression in cell lines or by in vitro DNA binding, techniques that fail to identify the ETS protein(s) utilized in vivo (Sementchenko and Watson 2000). Furthermore, no genome-wide occupancy of an ETS protein has been reported.

Determining the genomic occupancy of ETS proteins by ChIP will provide an unprecedented view of in vivo DNA-binding specificity within a transcription factor family and allow us to test mechanisms regulating ETS protein targeting. By investigating the endogenous ETS proteins ETS1, ELF1, and GABPα in the Jurkat human T-cell line, we discovered that these divergent family members frequently occupied the same genomic regions. This redundant occupancy correlated with a match to a strong consensus DNA-binding site and proximity to the transcriptional start site (TSS). Specific binding of ETS1 was also detected, but did not correlate with a strong match to a consensus site. A subset of ETS1-binding events correlated with an ETS–RUNX composite site that differed dramatically from ETS1 or RUNX1 consensus sites. The finding of two classes of ETS1 targets suggests a versatility of the ETS family, overlapping and specific DNA-binding modes that are mediated through distinct sequence motifs.

Results

ETS proteins display both specific and redundant occupancy

Three genes reported to be regulated by ETS1 illustrate the need for more robust in vivo approaches and attention to family issues. The T-cell receptor (TCR) α and β enhancers have been characterized in vitro as sites of cooperative binding between ETS1 and RUNX1. However, in vivo specificity is not clear, as transient expression assays indicate that multiple ETS proteins can activate via this binding site (Sun et al. 1995), and no ChIP has been reported. In contrast, the promoter of the protein kinase encoding gene, CDC2L2, is implicated as an ETS1 target by ChIP, but no tests for specificity were performed (Feng et al. 2004). To test ETS protein in vivo specificity, we investigated occupancy of the CDC2L2 promoter and _TCR_α and _TCR_β enhancers by four distantly related ETS proteins—ETS1, GABPα, ELF1, and ELK1—in Jurkat T cells (Fig. 1A). (Based on steady-state mRNA levels, these ETS genes rank first, second, ninth, and 11th, respectively, of 17 ETS genes that are expressed in Jurkat T cells [Hollenhorst et al. 2004]). ETS1, ELF1, and GABPα, but not ELK1, redundantly occupied the CDC2L2 promoter, whereas ETS1 specifically occupied the _TCR_α and _TCR_β enhancers (Fig. 2A). RUNX1 also occupied the _TCR_α and _TCR_β enhancers, supporting a role for RUNX1 in ETS1 specificity. These initial ChIP experiments detected the anticipated specific mode of binding for ETS proteins, but also found a surprising redundant mode.

An external file that holds a picture, illustration, etc. Object name is 1882fig2.jpg

Specific and redundant promoter occupancy of ETS proteins. ChIP from the Jurkat human T-cell line with antibodies specific to the indicated ETS proteins. (A) Gene-specific region analysis. ChIP DNA was PCR-amplified with gene-specific primers. The ChIP enrichment is the ratio of the quantitative PCR signal of specific genomic regions over background genomic DNA (mean of two negative control genomic regions). Bars indicate the mean and standard error of the mean from three independent ChIP experiments. (B,C) Genome-wide occupancy analysis. ChIP DNA was amplified, labeled, and hybridized to a promoter microarray for 17,000 human genes representing sequences between −5 kb and +2 kb relative to the TSS. A bound promoter includes one or more peaks as defined by at least one probe with a P(X) value of <0.001. Data represent the average of two biologically independent replicates. Diagrams illustrate the number of promoter regions bound by ETS1, ELF1, GABPα, and RUNX1, and combinations thereof.

Redundant occupancy by ETS proteins is widespread

To ascertain the biological significance of redundant occupancy and the relative importance of RUNX1 in specificity, we performed genome-wide promoter ChIP. The relative levels of specific and redundant binding of ETS proteins were assessed by a promoter microarray hybridized with ChIP DNA from the Jurkat human T-cell line. The promoter microarray represented the region from 5000 base pairs (bp) upstream of to 2000 bp downstream from the TSS of ∼17,000 human genes with 60-mer oligonucleotides at an average spacing of 200 bp. Promoters were scored as “bound” by statistical methods that considered the enrichment of multiple neighboring probes and consistent occupancy in experimental repetitions. Promoters occupied by ETS1, ELF1, or GABPα were frequently bound by one or more of the other ETS proteins (Fig. 2B). A second, independent set of ChIP–chip experiments, which was performed with a second promoter microarray that covered only regions within 1000 bp of the TSS, also indicated a very strong correlation between ETS1 and ELF1 occupancy (Supplementary Fig. S1).

This extensive overlap in potential targets was unexpected, and therefore we considered several possible nonbiological explanations. The overlap was not due to cross-reactivity of antibodies because the epitopes had no sequence similarity, and immunoprecipitation controls (Supplementary Fig. S2) as well as ChIP experiments (Fig. 2A) showed specificity. We considered a possible bias toward these genomic regions in the microarray design. ChIP–chip of E2F4, a transcription factor that does not belong to the ETS family, served as a negative control; a set of targets distinct from those bound by ETS1, ELF1, and GABPα, but similar to E2F targets in other cell types, was identified (Table 1; Supplementary Fig. S1; Boyer et al. 2005). Another concern was the sensitivity necessary to detect specific sites. Quantitative PCR detected ETS1-specific occupancy at the _TCR_α and _TCR_β enhancers (Fig. 2A), but these sites were not near the TSS and, thus, were not on the promoter microarrays. To use this positive control we designed a third microarray that covered 20-kb regions surrounding these enhancers. ETS1-specific binding regions were detected and correlated with the known enhancers (data not shown). These controls indicated that the overlapping ChIP enrichments at ETS1, ELF1, and GABPα target promoters represent an accurate picture of genome-wide occupancy.

Table 1.

Overrepresented ontologies of genes near bound promoters

An external file that holds a picture, illustration, etc. Object name is 1882tbl1.jpg

Redundant ETS binding correlates with a consensus ETS-binding site

We postulated that the redundant and specific classes of target genes may have different biological functions and that distinct mechanisms would dictate ETS protein recruitment to each class. A more in-depth comparison of redundant and specific binding regions required a data set of each binding class that minimized false-positive results (albeit at the cost of increasing false-negative results). Therefore, data sets of segments bound by ETS1 and ELF1, ETS1 but not ELF1, or ELF1 but not ETS1 were created (Fig. 3A). ChIP and quantitative PCR with primers specific for each candidate segment showed 88% or greater concurrence with ChIP–chip data, thus validating ETS1 and ELF1 dual-bound as well as ETS1-specific data sets (Fig. 3B,C). Tests for occupancy by the ETS protein ELK1 yielded negative results. The ELF1-specific data set was less reliable (Fig. 3D). Thus, the ETS1 and ELF1 dual-bound and ETS1-specific data sets were used for further analyses.

An external file that holds a picture, illustration, etc. Object name is 1882fig3.jpg

Validation of redundant and specific bound segments from genomic occupancy data sets. (A) Stringent classification of redundant versus specific data sets. Bound segments were identified regardless of promoter annotation. To identify specific binding events, the stringencies of an “unbound” score was reduced [minimum P(X) value >0.01] to minimize false negatives. The probe with the lowest P(X) value from each bound segment was taken to represent that segment for that ETS protein. To compare with occupancy, the value of this representative probe was plotted versus the highest −log P(X) value for the other ETS protein within 1 kb of this probe. Shaded areas define the three data sets used for the comparisons in the remainder of the figure (size indicated). ETS1-specific and dual-bound segments were identified by the ETS1 segment report. (Dual-bound segments identified by the ELF1 segment report were essentially the same.) ELF1-specific segments were identified by the ELF1 segment report. (B–D) Quantitative ChIP confirmed specific and redundant occupancy. ChIP DNA obtained with specific antibodies, as indicated, was analyzed by quantitative PCR, as in Figure 2A. The scale on the _Y_-axis was interrupted to show a broad range of values. Segments were assigned to a gene by the closest TSS. In B, the EGR1 promoter, a well-characterized ELK1 target, served as an antibody positive control. The other 19 tested segments were randomly selected from the “dual-bound” data set from A. In C, 10 segments were randomly selected from the ETS1-specific data set from A. In D, 12 segments were selected from the ELF1-specific data set with preference for those with the least evidence of ETS1 occupancy [highest ETS1 P(X) values].

We hypothesized that sequence elements would direct redundant versus specific binding. To search for such sequences we used the MEME algorithm, which identifies significantly overrepresented DNA sequences. The data output was position weight matrices (PWMs), which give frequency distributions of each base at each position (Bailey and Elkan 1994). The PWM for the most overrepresented sequence in the dual-bound data set (consensus: CCGGAAGT) (Fig. 4A) was strikingly similar to the derived in vitro-selected consensus sites for ETS1, ELF1, and GABPα (Fig. 1B). Greater than 70% of dual-bound segments had a sequence represented within this PWM. MEME did not identify any significantly overrepresented sequences in either the ETS1-specific data set or in any of 10 data sets randomly selected from the list of interrogated promoter regions (data not shown). To ensure that the distinction between dual and specific data sets was not due to the size of the data sets, randomly selected subsets of the dual-bound data set, similar in size to the specific data set, were tested; these smaller subsets also returned an ETS-like consensus sequence (data not shown). In conclusion, redundant binding by ETS proteins correlated with the presence of a strong consensus ETS-binding site.

An external file that holds a picture, illustration, etc. Object name is 1882fig4.jpg

Sequence characteristics of redundantly and specifically bound segments. (A) ETS1 and ELF1 dual-bound segments correlate with a strong match to a consensus ETS-binding site. The most overrepresented complex PWM identified by MEME (E = 1 × 10−30) in dual-bound segments is illustrated by a sequence logo, as in Figure 1B. (B) Distances between TSS and bound segments (from Fig. 3) showed proximal bias of dual occupancy. The distance from the bound segment to the nearest TSS in the Ensembl database was measured by using the location of the oligonucleotide probe with the lowest P(X) in each segment in the dual-bound or ETS1-specific data sets. Distances were binned and frequencies were plotted. The number of segments analyzed is indicated in parenthesis. (No nearby gene was identified for nine dual-bound segments and seven ETS1-specific segments). The mean distances from start were significantly different (362 bp for dual-bound; 532 bp for specific segments; _t-_test, P = 0.0002). (C) Distances between TSS and the best match to ETS-binding consensus showed proximal bias of dual occupancy. PATSER was used with the PWM (shown in A) to search for the best consensus match within 1500 bp of sequence upstream of and downstream from the TSS in the dual-bound or ETS1-specific data sets (from Fig. 3) as well as a set of 937 randomly selected genes. The position of the highest-scoring match to the PWM in each 3000-bp region surrounding the TSS was binned and plotted. The mean distance from the TSS to the best PWM match was significantly shorter for dual regions (386 bp) versus ETS1-specific (608 bp; _t_-test, P < 0.0001) or randomly selected regions (604 bp; _t_-test, P < 0.0001).

To further investigate the presence of ETS consensus sites in dually occupied promoters, we searched in a biased manner for ETS-binding motifs. The PWM identified by MEME (Fig. 4A) was used as a query sequence for the pattern-recognition program PATSER (Hertz and Stormo 1999). The search was performed on the 3000 bp surrounding the TSS of genes closest to ETS1 specifically bound segments, as well as those bound dually by ETS1 and ELF1. A set of randomly selected genes also was searched. The matches near dual-bound genes scored significantly higher (mean PATSER score 9.5 for dual bound vs. 8.7 for both specific and random) than those near the ETS1-specific (_t_-test; P < 0.0001) or randomly selected genes (P < 0.0001). Scores for random genes and specific genes showed no significant difference (P = 0.68). In conclusion, two independent bioinformatics approaches indicated that a consensus ETS-binding site correlates with redundant ETS protein occupancy.

The finding of a nondiscriminating consensus sequence led us to investigate other sequence features that might accompany redundant occupancy. To test whether there was a bias in the location of dual versus specific bound segments, the distance of each segment (measured from the highest-scoring oligonucleotide probe) to the TSS was determined for the ETS1 and ELF1 dual-bound and ETS1-specific data sets. Dual-bound segments clustered very strongly to a region within 200 bp of the TSS (Fig. 4B). (More detailed spacing conclusions are challenged by the limits of ChIP–chip resolution and TSS annotation.) Segments bound specifically by ETS1 showed significantly less constraint on their location and frequently appeared more distally (Fig. 4B) (_t_-test of the mean distance; P = 0.0002). In an independent approach to detect potential location bias, we measured the distance between the TSS and the best PMW matches from the PASTER analysis. A subset of the randomly selected genes had their strongest PWM matches in regions proximal to the TSS (Fig. 4C). However, the dual-bound genes were significantly enriched for this type of promoter, as indicated by significant difference between the mean distances (_t_-test; P < 0.0001). Notably, the mean distance of the ETS1-specific genes were not significantly different from that of the random genes (Fig. 4C). Only five ETS1-specific promoters (4%) had a perfect match to the PWM consensus within 200 bp of the TSS compared with 87 (14%) of the dual-bound promoters. In conclusion, two sequence properties correlated with redundant occupancy by ETS transcription factors—the presence of a consensus ETS-binding site and the tendency of this site to be located proximal to the TSS.

Housekeeping genes have redundantly occupied promoters

To investigate the biological role of the nonselective ETS binding at strong proximal ETS-binding sites, we asked whether the dual-bound gene set represented a specific biological pathway. Overrepresented ontologies of the genes near the ETS1 and ELF1 dual-bound promoters were queried by GOstat (Beissbarth and Speed 2004). Housekeeping categories (e.g., RNA processing, ribosomal proteins, and cellular metabolism) had significant enrichment scores (Table 1). Random gene lists of similar size did not return any significant overrepresented categories. Additional informatics analyses supported this correlation between dual occupancy and housekeeping function. Eighty-five percent of the ETS1 and ELF1 dual-bound regions overlapped with CpG islands, a sequence feature consistent with the promoters of housekeeping genes (Bird 1986). In contrast, data sets built from promoter regions with matched GC content that were randomly selected from the extended promoter array regions only displayed an average of 44% overlap with CpG islands (P < 0.01). Next, we queried our data set against a human gene set annotated for sequence characteristics of housekeeping genes (De Ferrari and Aitken 2006). The promoters of 16% of all genes surveyed had evidence of ETS1 and ELF1 dual occupancy (see Materials and Methods), whereas this proportion was 52% and 4% among genes classified as “housekeeping” and “nonhousekeeping,” respectively. Thus, three methods of classifying housekeeping genes each indicated an enrichment of redundant ETS occupancy at housekeeping promoters.

Co-occupancy of housekeeping promoters indicated a possible redundant function of ETS proteins at these promoters. We predicted that co-occupancy would not be cell-type specific, although different ETS protein combinations may be present in different cell types. To test this hypothesis, HT29 colon adenocarcinoma cell lines were used for ETS1 ChIP–chip. Promoters occupied by ETS1 were again overrepresented for housekeeping categories (Table 1). Therefore, ETS transcription factors appear to have a redundant role at the promoters of housekeeping genes, possibly in multiple cell types.

ETS1 and RUNX1 occupy promoters with a composite ETS1–RUNX-binding site

Specific occupancy of ETS1 in Jurkat T cells could be mediated through cooperative interactions only with RUNX1 or through a variety of cooperative partners. To differentiate between these two possibilities, a ChIP–chip experiment was performed with an antibody specific for RUNX1; 576 RUNX1-bound promoters were identified (Fig. 2C). However, only 36 of the 641 promoters bound by ETS1, but not ELF1, were also occupied by RUNX1. Therefore, cooperative interactions with RUNX1 likely represent one of a number of mechanisms that can mediate ETS1 specificity in Jurkat T cells.

Although RUNX1 occupancy could not explain the majority of the ETS1-specific binding, eight of the promoters with the strongest ETS1-binding signals also had strong RUNX1 signals (Fig. 5A, circled). In ChIP coupled with quantitative PCR, some of these segments showed weak binding signals for ELF1 and GABPα, but in every case the ETS1 antibody gave a strikingly higher signal (cf. Figs. 3B and ​5B), indicating that these segments strongly favored ETS1 binding.

An external file that holds a picture, illustration, etc. Object name is 1882fig5.jpg

RUNX1 and ETS1 co-occupancy correlates with a sequence similar to ETS- and RUNX1-binding sites. (A) Dual occupancy criteria. The maximum −log P(X) value of ETS1 and RUNX1 for each promoter region represented on the array is plotted. Eight promoter regions had a −log P(X) value of >5 for RUNX1 and >4 for ETS1 (circled). (B) Quantitative ChIP validated ETS1 and RUNX1 co-occupancy. ChIP DNA was analyzed by quantitative PCR and gene-specific primers, as in Figure 3B, for each of these eight promoter regions. (C) A ETS1/RUNX1 composite site. (Top) The most overrepresented sequence identified by MEME analysis of 87 ETS1- and RUNX1-bound segments (E = 4 × 10−33). (Bottom) The in vitro-derived binding sites for RUNX1 (Meyers et al. 1993) and ETS1 (Nye et al. 1992).

Because RUNX1 and ETS1 co-occupancy at these eight segments correlated with a strong ETS1-binding signal in ChIP–chip, but not a strong ETS1 consensus binding site (mean PATSER score 8.3), we speculated that novel sequence determinants could be present in these regions. A MEME analysis provided a PWM for the most overrepresented sequences in the 87 DNA segments bound by both ETS1 and RUNX1 (consensus: CTGGGAATTG TAGTT) (Fig. 5C). Sequences in 40 segments were represented by the PWM, including all eight surveyed by quantitative PCR (Fig. 5B). This site was well conserved across multiple mammalian genomes in the majority of identified segments, suggesting functional importance (Supplementary Fig. S3). This PWM had some similarity to ETS1- and RUNX1-binding sites. However, the consensus deviated substantially from the in vitro selected consensus for ETS1 and RUNX1 (Fig. 5C; Nye et al. 1992; Meyers et al. 1993) and from the ETS1–RUNX1 cooperative binding sites in the _TCR_α and _TCR_β enhancers (Gottschalk and Leiden 1990). All 40 PWM matches displayed the canonical GGA. However, only 26 had the conserved (A/T) wobble 3′ of the GGA, and 13 had an extremely divergent GGAG sequence. In addition, the other flanking nucleotides were dissimilar to those in the in vitro-derived ETS1 PWM, thus predicting extremely low affinity.

Based on the poor fit of this ETS1–RUNX1 composite site to consensus sites, we hypothesized that in vivo occupancy might require cooperative DNA binding between ETS1 and RUNX1. DNA-binding assays were performed to determine the relative affinity of ETS1 in the presence and absence of RUNX1 (Fig. 6). ETS1 bound weakly alone with an affinity 10- to 100-fold lower than that of a consensus site (Goetz et al. 2000). The affinity increased more than fivefold in the presence of RUNX1. Detection of ETS1 binding to GGAG sites required RUNX1. In conclusion, the ETS1–RUNX1 composite sites were marked by poor matches to ETS consensus sites and displayed low affinity that was improved by cooperative DNA binding. Remarkably, these extremely low-affinity sites displayed strong ChIP–chip signals (Fig. 5A) and extremely strong quantitative ChIP signals with gene-specific primers (Fig. 5B). We speculate that cooperative interactions can mediate extremely stable in vivo occupancy comparable to that of consensus ETS-binding sites.

An external file that holds a picture, illustration, etc. Object name is 1882fig6.jpg

ETS1 and RUNX1 bind DNA cooperatively to the MEME-derived composite site. Equilibrium binding curves for ETS1 and a 35-bp region of the MDS025 promoter (Fig. 5B) that displays a GGAA core (left) or with a single nucleotide change in the core to GGAG (right), which is also represented in the MEME-derived PWM. Experiments performed in the presence (gray squares) or absence (black circles) of 30 nM RUNX1 (1–302 fragment) and indicated ETS1. The KD was derived by curve fitting by nonlinear least-squares analysis with fraction of DNA bound = 1/(1 + KD/[ETS1]). ETS1-only binding to the GGAG site was below the level necessary for quantification by this assay.

The genes whose promoters were bound by ETS1 and RUNX1 did not fall into housekeeping categories. One alternative is that these genes are more specialized, representing differentiation-specific or tissue-specific targets. To consider candidates for ETS1 targets we note that an ETS1 gene disruption in the mouse causes defects in T and B cells (Bories et al. 1995; Muthusamy et al. 1995). Furthermore, ETS1 and RUNX1 are most abundant in hematopoietic cell types. Therefore, we queried the literature for possible hematopoietic functions of the 40 genes that had the ETS1–RUNX1 composite binding sites. Indeed, four genes had known roles in hematopoietic cells. One example was the transcription factor LEF1, a hematopoietic-specific transcription factor that regulates the _TCR_α enhancer (Travis et al. 1991). In conclusion, in contrast to the redundant role of ETS proteins at housekeeping genes, we predict tissue-specific gene expression will require weaker ETS sites—and thus, cooperative partnerships—to use specific ETS proteins.

Discussion

This genome-wide promoter occupancy study implicated the ETS family in a redundant role at the proximal promoters of housekeeping genes. We also detected specific binding, as defined by occupancy of ETS1, but not three other family members. Sequences motifs and spatial biases that correlated with the two modes of binding were identified by bioinformatics analyses. Most dramatically, unbiased searches for consensus sites correlated strong binding sites with redundant occupancy and weak binding sites with specific occupancy.

A redundant role for ETS proteins

Our survey of the in vivo occupancy of four ETS proteins from four different clades found that three of these proteins often occupy the same promoter regions. There are two possible models to explain the detection of multiple ETS proteins. This co-occupancy could represent a separate binding site for each factor or alternate occupancy of the same site in different cells, on different alleles, or at different times. Because the regions occupied redundantly by ETS family members correlated with a strong match to the consensus sites of multiple ETS proteins, we propose that the same binding site is bound alternatively by different ETS transcription factors.

Bioinformatics studies have identified hundreds of sequence motifs that are overrepresented in human promoters and ETS-like binding sites are always present on these lists (Bina et al. 2004; FitzGerald et al. 2004; Xie et al. 2005). Our results indicate that some of these sequence motifs are likely to be occupied in vivo by multiple ETS proteins. The ETS protein ELK1 did not co-occupy promoters with ETS1, ELF1, and GABPα. This is consistent with in vitro data suggesting that ELK1 has low affinity for a monomeric ETS site, but requires SRF and an adjoining SRF site for high-affinity binding (Price et al. 1995). Additional ChIP data will be required to uncover how many of the 23 remaining ETS transcription factors participate in redundant occupancy of strong ETS-binding sites.

The observation that 5%–15% of the 17,000 promoters are occupied by multiple ETS transcription factors suggests biological importance. In considering the potential significance we noted that many of the redundantly bound regions were associated with housekeeping genes. These findings are consistent with a bioinformatics report of a collection of ETS-type sequences as one of three sequence motifs found in proximal promoters of housekeeping genes (FitzGerald et al. 2004). Our discovery of redundant binding of ETS transcription factors at these genes suggests that this mode of binding could facilitate consistent regulation of ubiquitously expressed housekeeping genes. In this model, sustained expression would be independent of the varying levels or identity of individual ETS proteins in distinct cell types.

More intriguingly, we speculate that the redundantly occupied regions identified in our study are targets for oncogenic ETS proteins. Preliminary support for this hypothesis comes from recent observations in human prostate cancer. It is now proposed that more than half of all cases correlate with a chromosomal rearrangement that leads to overexpression of one of three ETS proteins (Tomlins et al. 2005, 2006). A gene set expressed at higher levels in prostatic intraepithelial neoplasia (PIN) than in normal prostate tissue have two features that are similar to our dually occupied data set. An ETS-binding site is the most enriched site from the TRANSFAC database in the promoters of PIN-specific genes. By ontology analysis, genes involved in protein biosynthesis, including those encoding ribosomal proteins, are up-regulated in PINs (Tomlins et al. 2007). We speculate that some housekeeping genes, specifically those redundantly regulated by ETS proteins, could be a class of misregulated targets relevant to tumor progression.

Specific binding and protein partnerships

Specific ETS1 occupancy was uncovered by the ChIP–chip experiment, although this class of targets was less frequent than the dual-bound targets. Furthermore, unbiased searching for enriched sequences did not find a PWM closely related to the in vitro-derived PWM for ETS1 or the PWM derived from dual-bound targets. We propose that a high-affinity ETS1-binding site would preclude specific occupancy due to its concurrent high affinity for other ETS proteins. We speculated that ETS1 achieves sufficient affinity for specific sites only by DNA-binding cooperativity with additional transcription factor(s). Indeed, intersecting ETS1 and RUNX1 ChIP–chip data sets facilitated an informed, yet unbiased, search that discovered a composite site resembling both ETS- and RUNX1-binding sites, but not matching either consensus. As additional ChIP–chip data become available, a similar strategy may identify sequences important for the remaining ETS1-specific sites and for specificity of other ETS proteins. ELK1-specific occupancy of the EGR1 promoter provides an exception to the trend of weak ETS sites for specific binding. This promoter has a strong ETS consensus juxtaposed to an SRF-binding site. This exception may have evolved because constitutive ELK1/SRF occupancy can occlude occupancy by other ETS proteins. Whether ELK1-specific binding generally occurs at weak or strong ETS sites awaits genome-wide occupancy data. In summary, the characterization of an ETS1/RUNX1 cooperative partnership by genome-wide occupancy data provides global evidence for combinatorial control of gene expression within the ETS family.

Limitations of undirected bioinformatics approaches

Mapping genome-wide transcription factor binding is envisioned to enable prediction of gene regulatory pathways. Bioinformatics approaches have attempted to use factor-binding predictions that are based on experimentally derived PWMs that favor the strongest binding sites in vitro (Aerts et al. 2003; Blanchette et al. 2006). Our observation that strong binding sites do not correlate with specificity challenges the accuracy of these in vitro-based approaches, especially for gene families. Other ChIP–chip assays have revealed that many transcription factors bind to regions of the genome that contain no strong matches to the consensus. Our data show how this phenomenon is related to specificity requirements for that particular transcription factor. Thus, bioinformatics approaches that can successfully identify transcription factor-binding sites in silico will likely require the integration of rules for cooperative DNA-binding partnerships.

General rules for gene families

An interesting question is whether these features of the ETS family are predictive of other transcription factor families. There are >20 identified families whose members have conserved DNA-binding domains and common binding properties (Messina et al. 2004). The only mammalian transcription factor family with more than one member assayed by ChIP–chip is the six-member E2F family (Ren et al. 2002; Weinmann et al. 2002; Oberley et al. 2003; Wells et al. 2003). The microarrays available at the time of these studies only assayed a small subset of promoters, and none of these studies identified differences between redundantly and specifically occupied regions. Thus, our study expanded the use of genome-wide occupancy techniques to survey a larger mammalian transcription factor family and revealed characteristics of specifically and redundantly occupied loci. We expect our discovery of extremely weak sites, as a feature of the ETS1–RUNX1 interaction, will be generally applicable to many other DNA-binding partnerships. On the other hand, the use of the ETS family in a redundant manner at proximal promoters may represent a biologically important feature unique to this family.

In conclusion, an in vivo genomic occupancy approach demonstrated both redundant and specific roles for ETS transcription factors. Interchangeable occupancy of diverse members of the family at proximal promoters suggests an unexpected overlapping function for proteins that share no similarity outside of their DNA-binding domains. Furthermore, this occupancy uncovers a strategy that could mediate stable expression of housekeeping genes by making them relatively resistant to changes in transcription factor concentration and regulatory modifications. In contrast, weaker binding sites provide opportunities to enhance affinity and add specificity for biological regulation. In conclusion, the distinct promoter occupancy patterns of ETS proteins demonstrate the versatile use of a transcription factor family.

Materials and methods

Genomic tiling microarray designs

Two promoter microarrays were used in our studies. The promoter microarray (Agilent Technologies, G4481A), which was used only for data in Supplementary Figure S1, consisted of two slides with a combined 88,000 60-mer oligonucleotide probes representing sequences from −1 kb to +0.3 kb relative to the TSS of ∼17,000 best-defined human transcripts from University of California at Santa Cruz hg17/NCBI release 35 (May 2004). An average promoter region was represented by four to five probes. A second promoter microarray (Agilent Technologies, G4489A) consisted of two slides with a combined 488,000 60-mer oligonucleotides representing sequences from −5 kb to +2 kb relative to the same TSS as in the proximal microarray. A custom microarray was manufactured by Agilent Technologies to represent the region from 10 kb upstream of to 10 kb downstream from the previously identified _TCR_α and _TCR_β enhancers using probes from the Agilent genomic tiling database. The average spacing between probes within these regions was ∼100 bp.

Cell culture

Cell lines were grown using standard tissue culture techniques. Jurkat cells were maintained in RPMI medium (GIBCO) plus 10% fetal bovine serum (FBS), 2 mM L-Glutamine, 10 mM Na Pyruvate, and 10 mM Hepes. HT29 cells were maintained in McCoys 5A medium (GIBCO) plus 10% FBS and 2 mM L-Glutamine.

ChIP

Dynabeads (50 μL) conjugated to sheep anti-rabbit IgG (Dynal Biotechnology) were mixed with 1 mL of Dilution Buffer (20 mM Tris at pH 7.9, 2 mM EDTA, 150 mM NaCl, 1% Triton X-100, 4 mg/mL bovine serum albumin [BSA], mammalian protease inhibitors [Sigma, #P8340]) and rotated for 10 min at 4°C. Next, 5 μL of polyclonal rabbit antibody (ETS1, sc-350; ELF1, sc-631; GABPα, sc-22810; ELK1, sc-355; and E2F4, sc-1082; Santa Cruz Biotechnology) or monoclonal mouse antibody (RUNX1, α3.2.3.1; gift of Dr. Nancy Speck [Dartmouth Medical School, Hanover, NH]) was added and the slurry was rotated overnight at 4°C. Cross-linked and sheared chromatin extracts were prepared as described previously (Hollenhorst et al. 2004). The extract (100 μL) was added to Dynabead/antibody slurry and rotated for 4 h at 4°C. Beads were washed four times for 5 min with immunoprecipitation wash buffer (20 mM Tris at pH 7.9, 2 mM EDTA, 250 mM NaCl, 0.25% NP-40, 0.05% SDS). Beads were resuspended in 100 μL of 10 mM Tris (pH 8.0) and 100 μg/mL RNase A and incubated for 30 min at 37°C. SDS concentration was brought to 1% and Proteinase K was added to 200 μg/mL. Bead slurries were then incubated for 3 h at 55°C and 6 h at 65°C. ChIP DNA was purified from slurries by phenol/chloroform extraction and QiaQuick PCR Purification Kit (Qiagen).

ChIP DNA amplification, labeling, hybridization, and scanning

ChIP DNA was amplified by a whole-genome amplification kit (WGA2, Sigma) according to the manufacturer’s instructions, except that the fragmentation step was skipped and the number of PCR cycles was increased to 20. Alternate random primed and linker-mediated PCR amplification protocols were compared with the WGA2 kit on the proximal promoter microarray and gave similar results for ETS1 occupancy (data not shown). Amplified DNA was treated with a QiaQuick PCR purification kit (Qiagen), then labeled, hybridized to the Agilent microarrays, and washed as previously described (Boyer et al. 2005). Hybridized microarrays were scanned using an Agilent G2565BA microarray scanner and raw image files were processed with Agilent Feature Extraction software (version 8.5). Two replicates of each ChIP–chip experiment from independent cell cultures were performed, with the exception of three repetitions of Jurkat ETS1 on the proximal promoter microarray and one repetition of HT29 ETS1 on the proximal promoter microarray. The data discussed in this publication have been deposited in NCBI’s Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) and are accessible through GEO Series accession number GSE7449.

Classification of binding events

Data were analyzed with ChIP Analytics (version 1.3, Agilent Technologies) with the Whitehead Error Model. Normalization consisted of subtraction of median signals from negative control features, interarray median normalization, and dye-bias median normalization. A weighted average was used for replicates. A value, X, was calculated for each probe; this value correlates with a log ratio, but includes a correction for low intensities (for equations, see ChIP Analytics 1.3 user’s guide). A P value for each probe, P(X), represented the probability of observing an X value as high as or higher than its own, given the normal distribution of the mean and standard deviation. To incorporate the data of neighboring probes (peaks), a value X was calculated as the mean of the X value from that probe and the neighboring probe on either side. If the neighboring probe is >1 kb away, a value of 0 was substituted for X. A P value for X, P(X), was calculated just as for P(X).

For the data shown in Figures 2 and ​5A and Table 1, a “bound” promoter designation required that one or more probes within a promoter region have a P(X) of <0.001 as determined by the “gene report” output of ChIP Analytics. The P(X) values used in Table 1 and Figure 5A were the lowest P(X) for any probe within that promoter region. Genes in the De Ferrari data set were considered to have evidence of dual occupancy if the corresponding promoters had minimum P(X) values of <0.01 for both ETS1 and ELF1.

The segments shown in Figure 3A were derived from the “segment report” of the ChIP Analytics, which classifies a genomic region as a “bound” segment if there is a series of “bound” probes with <1 kb gaps. A “bound” probe must satisfy a significance heuristic of P(X) < 0.001 and either [P(X) < 0.001 and one neighboring probe with P(X) < 0.01,] or [P(X) < 0.005 for that probe and a neighbor, or P(X) < 0.005 for both neighbors]. For each bound segment, the probe in the segment with the lowest P(X) was considered the center of that segment and that P(X) value was used as the P(X) value for the segment.

Dual-bound ETS1 and ELF1 or ETS1 and RUNX1, or specifically occupied ETS1 but not ELF1 were segregated by examining the P(X) value of all probes from the ELF1 or RUNX1 probe report that lie within 1 kb from the center of the ETS1 segment (ChIP Analytics). Segments were considered dual bound if one of these probes had a P(X) value <0.001 and specific if none of these probes had a P(X) value <0.01. Dual-bound and specific segments were matched to a specific gene by identifying the nearest TSS from the Ensembl database.

Analysis of bound segments by MEME, PATSER, and GOstat

Bound segments were shortened by utilizing only the region spanning the central probe [lowest P(X) value] and ending at probes that displayed a 100-fold increase in P(X) value. If the edge of a segment (gap of at least 1000 bp) was encountered first, the segment was extended by 200 bp and ended. Segments were analyzed by MEME (http://meme.sdsc.edu/meme/meme.html) (Bailey and Elkan 1994) to identify sequences of variable length that occur more often than expected. Such sequences are reported as PWMs and are given an E value (expect value) describing the number of times that PWM would be identified by chance in a set of sequences of that size. MEME was run with default settings, except that the maximum motif length was set at 15 nucleotides. All sequences shown had the lowest E value of all complex sequences returned. (Runs of a single nucleotide were not considered complex.)

Three kilobases of sequence surrounding the TSS of each gene matched to ETS1/ELF1 dual-bound or ETS1 specifically bound segments and 937 randomly selected genes (Ensembl) were analyzed by PATSER (http://rsat.ulb.ac.be/rsat) (Hertz and Stormo 1999). The PATSER program moved a window equal to the length of the ETS PWM (Fig. 4A) along both strand sequences and assigned a score to each position. The position of the highest-scoring PWM match for each sequence relative to the TSS was recorded.

The genes with the 400 lowest P(X) values for each category (except for ETS1-specific genes, where all 437 genes were used) were analyzed for overrepresented ontologies using GOstat (http://gostat.wehi.edu.au) (Beissbarth and Speed 2004). [For ETS1/ELF1/GABPα co-occupied genes, each P(X) value was <0.001 and the mean P(X) value was used]. A list of the ∼17,000 genes represented on the microarray was used as a background gene list. Random gene lists (400 each) were generated from this background gene list. The maximal P value for returned categories was set to 0.001. Redundant gene categories (differing by less than three genes) were collapsed to one category and uninformative gene categories were not recorded.

Real-time PCR

Real-time PCR was performed as described previously (Hollenhorst et al. 2004). Serial dilutions of Jurkat ChIP input DNA were used as a standard curve for real-time PCR. Primers designed for the specified genomic regions were found to amplify a single product from genomic input DNA based on a single melting peak. Each ChIP DNA sample was assayed for the levels of two negative control regions, the 3′ ends of the albumin, and BCL-XL genes. In all cases, the absolute levels of these control regions varied by less than twofold. The mean level of the control regions was considered the background level of genomic DNA. ChIP enrichments are reported as a ratio of the absolute measurement of each genomic locus to the background level of genomic DNA in the same sample. Primers used to assay genomic loci are listed in Supplementary Table S3.

Protein expression and purification

Human ETS1 (p51) was cloned into bacterial expression vector pET28A (Novagen) at a site that introduces a 6× HIS tag at the N terminus. ETS1 protein was expressed and purified as described previously (Jonsen et al. 1996). Protein concentration was determined by comparison to BSA standards by Coomassie brilliant blue-stained SDS-PAGE gels, and activity was determined by binding to the high-affinity ETS1-binding duplex 5′-TCGACGGCCAAGCCGGAAGTGAGTGCC-3′ (Nye et al. 1992).

A fragment of the RUNX1 protein that includes amino acids 1–302 was a gift of Nancy Speck. This fragment retains all of the regions necessary for cooperative DNA binding with ETS1 and was purified from baculovirus-infected SF9 cells (Gu et al. 2000).

DNA-binding assays

Quantitative electrophoretic mobility shift assays were performed as described previously (Jonsen et al. 1996). In brief, indicated concentrations of ETS1 protein were mixed with 32P-labeled double-stranded oligonucleotides at 1 × 10−11 M, then incubated for 1 h on ice. Duplexes, designated either GGAA or GGAG were composed of the following sequences, respectively: CACAGGAATGCTGGGAATTGTAGTTTTCGCTCTGT; CA CAGGAATGCTGGGAGTTGTAGTTTTCGCTCTGT. Reaction mixtures with RUNX1 (amino acids 1–302) at 3 × 10−8 M were incubated on ice for 30 min before addition of ETS1 protein. Aliquots of binding mixtures were run on a 6% polyacrylamide gel and relative radioactivity in bound or unbound DNA bands was quantified by PhosphorImager (Molecular Dynamics). KDs were calculated as described previously (Goetz et al. 2000) using least squares curve fit of fraction of DNA bound = 1/(1 + KD/[ETS1]).

Acknowledgments

We thank University of Utah colleagues Brian Dalley from the Microarray Core and Cody Haroldsen from the Informatics Core for assistance. This work was supported by the National Institutes of Health (B.J.G., R01GM38663; P.C.H. fellowship support, T32CA93247; P01CA24014 for Huntsman Cancer Institute) and by the American Cancer Society (P.C.H. post-doctoral fellowship, #PF-03-122-01-GMC). We also acknowledge support from the Huntsman Cancer Institute and the U.S. Department of Energy.

Footnotes

References


Articles from Genes & Development are provided here courtesy of Cold Spring Harbor Laboratory Press