An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments (original) (raw)

Nature Biotechnology volume 20, pages 835–839 (2002)Cite this article

Abstract

Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP–array) has become a popular procedure for studying genome-wide protein–DNA interactions and transcription regulation. However, it can only map the probable protein–DNA interaction loci within 1–2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP–array-selected sequences and searches for DNA sequence motifs representing the protein–DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration1,2,3,4 and position-specific weight matrix updating5,6,7,8,9, and incorporates the ChIP–array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP–array experiments in yeast10,11,12,13 (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms5,8,9. MDscan can be used to find DNA motifs not only in ChIP–array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

$209.00 per year

only $17.42 per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Similar content being viewed by others

References

  1. van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
    Article CAS Google Scholar
  2. Bussemaker, H.J., Li, H. & Siggia, E.D. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad. Sci. USA 97, 10096–10100 (2000).
    Article CAS Google Scholar
  3. Sinha, S. & Tompa, M. A statistical method for finding transcription factor binding sites. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 344–354 (2000).
    CAS PubMed Google Scholar
  4. Vilo, J., Brazma, A., Jonassen, I., Robinson, A. & Ukkonen, E. Mining for putative regulatory elements in the yeast genome using gene expression data. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 384–394 (2000).
    CAS PubMed Google Scholar
  5. Hertz, G.Z., Hartzell, G.W. & Stormo, G.D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 6, 81–92 (1990).
    CAS PubMed Google Scholar
  6. Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
    CAS PubMed Google Scholar
  7. Liu, J.S., Neuwald, A.F. & Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90, 1156–1170 (1995).
    Article Google Scholar
  8. Roth, F.P., Hughes, J.D., Estep, P.W. & Church, G.M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939–945 (1998).
    Article CAS Google Scholar
  9. Liu, X., Brutlag, D.L. & Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 127–138 (2001).
  10. Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).
    Article CAS Google Scholar
  11. Iyer, V.R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).
    Article CAS Google Scholar
  12. Lieb, J.D., Liu, X., Botstein, D. & Brown, P.O. Promoter-specific binding of Rap1p revealed by genome-wide maps of protein-DNA association. Nat. Genet. 28, 327–334 (2001).
    Article CAS Google Scholar
  13. Simon, I. et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697–708 (2001).
    Article CAS Google Scholar
  14. Dolan, J.W., Kirkman, C. & Fields, S. The yeast STE12 protein binds to the DNA sequence mediating pheromone induction. Proc. Natl. Acad. Sci. USA 86, 5703–5707 (1989).
    Article CAS Google Scholar
  15. Graham, I.R. & Chambers, A. Use of a selection technique to identify the diversity of binding sites for the yeast RAP1 transcription factor. Nucleic Acids Res. 22, 124–130 (1994).
    Article CAS Google Scholar
  16. Buchman, A.R., Kimmerly, W.J., Rine, J. & Kornberg, R.D. Two DNA-binding factors recognize specific sequences at silencers, upstream activating sequences, autonomously replicating sequences, and telomeres in Saccharomyces cerevisiae. Mol. Cell Biol. 8, 210–225 (1988).
    Article CAS Google Scholar
  17. Idrissi, F.Z. & Pina, B. Functional divergence between the half-sites of the DNA-binding sequence for the yeast transcriptional regulator Rap1p. Biochem. J. 341, 477–482 (1999).
    Article CAS Google Scholar
  18. Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
    Article CAS Google Scholar
  19. Lawrence, C.E. et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993).
    Article CAS Google Scholar
  20. Liu, J.S. Monte Carlo Strategies in Scientific Computing (Springer, New York, 2001).
    Google Scholar

Download references

Acknowledgements

The authors thank the Brown lab at Stanford (especially Jason D. Lieb) and the Young lab at MIT (especially Bing Ren) for their valuable data and scientific insight. This work is supported by National Human Genome Research Institute grants R01 HGF02235 and R01 HG02518-01, and National Science Foundation grant DMS-0094613.

Author information

Authors and Affiliations

  1. Stanford Medical Informatics, Stanford University, Stanford, 94305, CA
    X. Shirley Liu
  2. Department of Biochemistry, Stanford University, Stanford, 94305, CA
    Douglas L. Brutlag
  3. Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA
    Jun S. Liu

Authors

  1. X. Shirley Liu
    You can also search for this author inPubMed Google Scholar
  2. Douglas L. Brutlag
    You can also search for this author inPubMed Google Scholar
  3. Jun S. Liu
    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toJun S. Liu.

Rights and permissions

About this article

Cite this article

Liu, X., Brutlag, D. & Liu, J. An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments.Nat Biotechnol 20, 835–839 (2002). https://doi.org/10.1038/nbt717

Download citation

This article is cited by