A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors - PubMed (original) (raw)

Comparative Study

A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors

Hongkai Ji et al. Nucleic Acids Res. 2006.

Abstract

Genome-wide location analysis (ChIP-chip, ChIP-PET) is a powerful technique to study mammalian transcriptional regulation. In order to obtain a basic understanding of the location data generated for mammalian transcription factors and potential issues in their analysis, we conducted a comparative study of eight independent ChIP experiments involving six different transcription factors in human and mouse. Our cross-study comparisons, to the best of our knowledge the first to analyze multiple datasets, revealed the importance of carefully chosen genomic controls in the de novo identification of key transcription factor binding motifs, raised issues about the interpretation of ubiquitously occurring sequence motifs, and demonstrated the clustering tendency of protein-binding regions for certain transcription factors.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Comparisons of de novo motif discovery from multiple ChIP studies. Eight experiments were examined here, including (A) Gli-chip, (B) ER-chip, (C) p53-PET, (D) Oct4-chip, (E) Sox2-chip, (F) Nanog-chip, (G) Oct4-PET and (H) Nanog-PET. For each factor, representative motifs recovered by de novo discovery are shown. The motif score reported by Gibbs motif sampler for each motif is shown in parantheses. The complete lists of motifs reported by Gibbs motif sampler are present in Supplementary Figures S1–S8. For each reported motif, the relative enrichment level _r_1, _r_2 and _r_3 were computed by comparing TFBS occurrence rates in ChIP regions to their counterparts in matched genomic controls (labeled by ‘Matched’) or random genomic controls (labeled by ‘Random’). The enrichment levels of all discovered motifs are compared here, and the data used to generate the figures are listed in Supplementary Tables S1–S8. The motifs responsible for the sequence-specific protein binding are underlined or indicated by arrows. For Nanog-chip and Nanog-PET, the motif responsible for the binding was unknown. In these two cases, Oct-Sox composite motif was highlighted, and the relative enrichment levels of the previously proposed Nanog motif (4) (labeled by ‘Nanog’) are also shown. For Gli-chip, ER-chip, Oct4-chip, Sox2-chip and Nanog-chip, enrichment levels in relative to design-based controls were also computed and are shown in Supplementary Figure S9.

Figure 2

Figure 2

Motifs that contain one or more CCCAG components. CCCAG components are highlighted by dashed rectangles. Arrows indicate possible repetitive or palindrome structures. Motifs are indexed by the experiment from which they were recovered (e.g. p53_M8 means that the motif was recovered from p53 data and was the eighth strongest one in that experiment in terms of motif score).

Figure 3

Figure 3

GC-content and cross-species conservation of ChIP-binding regions. (A) GC-content of ChIP-binding regions. Blue bar, genome-wide GC-content; cyan bar, GC-content for ChIP-binding regions; yellow bar, GC-content for genomic regions surveyed by the ChIP experiments; red bar, GC-content for matched genomic controls. The error bar shows three times standard error for the GC-content estimate. For p53-PET, Oct4-PET and Nanog-PET, the regions surveyed by the ChIP experiments are the whole genome; for ER-chip, they are human chr21 and chr22; for Oct4-chip, Sox2-chip and Nanog-chip, they are promoter regions that span from −8 to +2 kb of TSS; for Gli-chip, they are regions tiled in the custom array. Matched genomic controls used here are the same control regions used for relative enrichment computation. Detailed base occurrence frequencies for A, C, G and T are listed in Supplementary Table S9. (B and C) Cumulative probability function of conservation scores for ChIP regions and genomes. Conservation scores are defined for each base pair and were linearly scaled to interval [0, 255]. A large score corresponds to a more conserved status. Human Genome, genome-wide conservation for human; Human Promoter, conservation for human promoter regions spanning from −8 to +2 kb of TSS; Human chr21 and 22, conservation for human chr21 and chr22; and 22, conservation for human chr21 and chr22; Mouse Genome, genome-wide conservation for mouse; Mouse tiled in Gli, conservation for regions surveyed in the Gli study, i.e. regions tiled in the custom array.

Figure 4

Figure 4

Clustering tendency of binding regions. (AC) Three examples of binding region clusters. The transcription factor that binds to DNA is shown on the top of each figure. The gene that is bound by the transcription factor is shown on the bottom of the figure. For the two ChIP-chip examples (A and B), binding regions are indicated by high fold enrichment of IP samples versus control samples (i.e. peaks in the figure). For the ChIP-PET example (C), binding regions are indicated by the blocks in the ‘Oct4-PET’ and ‘Nanog-PET’ track in the UCSC genome browser. (DG) Cumulative probability functions (CDF) of the observed and simulated peak-to-peak distance. The simulated distance was considered to be the random distribution. Observed and simulated distributions were fitted by Gamma and Exponential density respectively, the fitted CDF are also shown.

Similar articles

Cited by

References

    1. Boyer L.A., Lee T.I., Cole M.F., Johnstone S.E., Levine S.S., Zucker J.P., Guenther M.G., Kumar R.M., Murray H.L., Jenner R.G., et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005;122:947–956. - PMC - PubMed
    1. Carroll J.S., Liu X.S., Brodsky A.S., Li W., Meyer C.A., Szary A.J., Eeckhoute J., Shao W., Hestermann E.V., Geistlinger T.R., et al. Chromosome-wide mapping of estrogen receptor binding reveals long-range regulation requiring the forkhead protein FoxA1. Cell. 2005;122:33–43. - PubMed
    1. Wei C.L., Wu Q., Vega V.B., Chiu K.P., Ng P., Zhang T., Shahab A., Yong H.C., Fu Y., Weng Z., et al. A global map of p53 transcription-factor binding sites in the human genome. Cell. 2006;124:207–219. - PubMed
    1. Loh Y.H., Wu Q., Chew J.L., Vega V.B., Zhang W., Chen X., Bourque G., George J., Leong B., Liu J., et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nature Genet. 2006;38:431–440. - PubMed
    1. Liu X.S., Brutlag D.L., Liu J.S. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotechnol. 2002;20:835–839. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources