CpG-depleted promoters harbor tissue-specific transcription factor binding signals--implications for motif overrepresentation analyses - PubMed (original) (raw)

CpG-depleted promoters harbor tissue-specific transcription factor binding signals--implications for motif overrepresentation analyses

Helge G Roider et al. Nucleic Acids Res. 2009 Oct.

Abstract

Motif overrepresentation analysis of proximal promoters is a common approach to characterize the regulatory properties of co-expressed sets of genes. Here we show that these approaches perform well on mammalian CpG-depleted promoter sets that regulate expression in terminally differentiated tissues such as liver and heart. In contrast, CpG-rich promoters show very little overrepresentation signal, even when associated with genes that display highly constrained spatiotemporal expression. For instance, while approximately 50% of heart specific genes possess CpG-rich promoters we find that the frequently observed enrichment of MEF2-binding sites upstream of heart-specific genes is solely due to contributions from CpG-depleted promoters. Similar results are obtained for all sets of tissue-specific genes indicating that CpG-rich and CpG-depleted promoters differ fundamentally in their distribution of regulatory inputs around the transcription start site. In order not to dilute the respective transcription factor binding signals, the two promoter types should thus be treated as separate sets in any motif overrepresentation analysis.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

(a) Bimodal CpG distribution across the promoters of the 50, 200 or 500 most heart-specific genes. The fraction of HCPs in the 200 and 500 gene sets is larger than expected based on the CpG content across all mouse promoters (the black line indicates the expected CpG distribution for random gene sets of size 500). (b) and (c) show the contribution of HCP genes to other tissue sets of indicated size based on EST or microarray data, respectively. The contribution of HCP promoters usually increases with gene set size and is >20% even for most sets of size 50. Sets with more than 200 genes often contain an excess of HCP genes compared to the fraction of 46% of HCPs across all Ensembl promoters (indicated by horizontal lines).

Figure 2.

Figure 2.

Enrichment of high affinity sites for HNF1 and MEF2 near the TSS of liver- and muscle-specific genes with CpG-depleted promoters. (a) and (b) Sequence windows with highest affinity are preferentially located directly upstream of the TSS (red bars). This signal is due to sites in the CpG-depleted promoters as no preferential binding pattern is observed when restricting the analysis to CpG-rich promoters (compare blue and yellow bars for results from CpG-rich and CpG-depleted genes, respectively). (c) and (d) show the corresponding PASTAA enrichment scores (see ‘Materials and Methods’ section) for each sequence window as well as the separate sets of high and low CpG promoters.

Figure 3.

Figure 3.

TF-binding affinity enrichment near the TSS of tissue-specific genes with either CpG-rich (a) or CpG-depleted promoters (b). The height of each bar corresponds to the PASTAA enrichment score of the most significant association that is found for the corresponding tissue. With the exception of testis, no significant enrichment signals are detected when analyzing the tissue sets containing the 500 most specific CpG-rich promoters. In contrast, enrichment peaks strongly for tissue-specific sets of 500 _P_-value matched LCP genes when computing TF-binding affinities for 200-bp windows directly upstream of the TSSs.

Figure 4.

Figure 4.

TF targets have low average CpG content. Yellow and blue bars indicate the number of matrices whose target genes have an average CpG content ≥0.5 and <0.5, respectively. Red bars indicate the overall propensity to find the most significant association between matrices and any of the tissues at a particular window position. About a third of all matrices show the strongest association with any of the tissues when computing the binding affinities for the window ranging from −200 to 0 bp upstream of the TSS, indicating a strong location preference for the proximal promoter (see red bars). The target genes of the vast majority of matrices thereby have an average CpG content <0.5 (compare yellow and blue bars).

Similar articles

Cited by

References

    1. Ho Sui SJ, Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, Wasserman WW. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 2005;33:3154–3164. - PMC - PubMed
    1. Chang LW, Nagarajan R, Magee JA, Milbrandt J, Stormo GD. A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles. Genome Res. 2006;16:405–413. - PMC - PubMed
    1. Halperin Y, Linhart C, Ulitsky I, Shamir R. Allegro: analyzing expression and sequence in concert to discover regulatory programs. Nucleic Acids Res. 2009;37:1566. - PMC - PubMed
    1. Conlon E. Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl Acad. Sci. USA. 2003;100:3339–3344. - PMC - PubMed
    1. Guhathakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res. 2006;34:3585. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources