Assessing computational tools for the discovery of transcription factor binding sites (original) (raw)

References

  1. Pevzner, P. & Sze, S.-H. Combinatorial approaches to finding subtle signals in DNA sequences. in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ed. Altman, R. et al.). 269–278 (AAAI Press, Menlo Park, CA, 2000).
    Google Scholar
  2. Sinha, S. & Tompa, M. Performance comparison of algorithms for finding transcription factor binding sites. in 3 rd IEEE Symposium on Bioinformatics and Bioengineering (ed. Bourbakis, N.G.). 214–220 (IEEE Computer Society, New York, 2003).
    Google Scholar
  3. Burset, M. & Guigó, R. Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).
    Article CAS Google Scholar
  4. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
    Article CAS Google Scholar
  5. Reese, M.G. et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000).
    Article CAS Google Scholar
  6. Ashburner, M. A biologist's view of the Drosophila genome annotation assessment project. Genome Res. 10, 391–393 (2000).
    Article CAS Google Scholar
  7. Hughes, J.D., Estep, P.W., Tavazoie, S. & Church, G.M. Computational identification of _cis_-regulatory elements associated with functionally coherent groups of genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).
    Article CAS Google Scholar
  8. Workman, C.T. & Stormo, G.D. ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. in Pacific Symposium on Biocomputing (ed. Altman, R., Dunker, A.K., Hunter, L. & Klein, T.E.). 467–478 (Stanford University, Stanford, CA, 2000).
    Google Scholar
  9. Hertz, G.Z. & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999).
    Article CAS Google Scholar
  10. Frith, M.C., Hansen, U., Spouge, J.L. & Weng, Z. Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32, 189–200 (2004).
    Article CAS Google Scholar
  11. Ao, W., Gaudet, J., Kent, W.J., Muttumu, S. & Mango, S.E. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305, 1743–1746 (2004).
    Article CAS Google Scholar
  12. Bailey, T.L. & Elkan, C. The value of prior knowledge in discovering motifs with MEME. in Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology. 21–29 (AAAI Press, Menlo Park, CA, 1995).
    Google Scholar
  13. Eskin, E. & Pevzner, P. Finding composite regulatory patterns in DNA sequences. Bioinformatics (Supplement 1) 18, S354–S363 (2002).
    Article Google Scholar
  14. Thijs, G. et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001).
    Article CAS Google Scholar
  15. van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
    Article CAS Google Scholar
  16. van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in noncoding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).
    Article CAS Google Scholar
  17. Régnier, M. & Denise, A. Rare events and conditional events on random strings. Discrete Math. Theor. Comput. Sci. 6, 191–214 (2004).
    Google Scholar
  18. Favorov, A.V., Gelfand, M.S., Gerasimova, A.V., Mironov, A.A. & Makeev, V.J. Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length and its validation on the ArcA binding sites. in Proceedings of BGRS 2004 (BGRS, Novosibirsk, 2004).
    Google Scholar
  19. Pavesi, G., Mereghetti, P., Mauri, G. & Pesole, G. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 (2004).
    Article CAS Google Scholar
  20. Sinha, S. & Tompa, M. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003).
    Article CAS Google Scholar
  21. Wingender, E., Dietze, P., Karas, H. & Knüppel, R. TRANSFAC: a Database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).
    Article CAS Google Scholar
  22. Moult, J., Fidelis, K., Zemla, A. & Hubbard, T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 53, 334–339 (2003).
    Article CAS Google Scholar
  23. Sinha, S., Blanchette, M. & Tompa, M. PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformat. 5, 170 (2004).
    Article Google Scholar

Download references

Acknowledgements

We thank Mathieu Blanchette, Ari Frank, Phil Green, Susan Hewitt, S.N. Maheshwari, Larry Ruzzo, Terry Speed, Gary Stormo and the organizers and participants of the 2002 Bellairs Workshop on Computational Biology for their important contributions to this project. Martin Tompa and Nan Li were supported by National Science Foundation (NSF) grant DBI-0218798 and by National Institutes of Health (NIH) grant R01 HG02602. Alexander Favorov, Andrei Mironov and Vsevolod Makeev were supported by Howard Hughes Medical Institute grant 55000309, Ludwig Cancer Research Institute grant CRDF RBO-1268-MO-02, Russian Fund of Basic Research grant 04-07-90270 and support from the Russian Academy of Sciences Presidium Program in Molecular and Cellular Biology, project no. 10. Yutao Fu, Martin C. Frith and Zhiping Weng were supported by NSF grant DBI-0116574 and NIH NHGRI grant 1R01HG03110. Giulio Pavesi and Graziano Pesole were supported by the Italian Ministry of University and Scientific Research's Fondo Italiano per la Ricerca di Base project 'Bioinformatica per la Genomica e la Proteomica' and by Telethon. Nicolas Simonis and Jacques van Helden were supported by the European Communities grant QLRI-199-01333, by the Action de Recherches Concertées de la Communauté Française de Belgique and by the Government of the Brussels Region. Saurabh Sinha was supported by a Keck Foundation Fellowship. Gert Thijs and Bart De Moor were supported by Geconcerteerde Onderzoeks-Acties Mefisto-666 and Ambiorics, InterUniversity Attraction Pole V-22, and several funded projects of the Institut voor de aanmoediging van Innovatie door Wetenshap en Technologie in Vlaanderen, Fonds voor Wetenshappelijk Onderzoek, and European Union. Zhou Zhu is a Howard Hughes Medical Institute predoctoral fellow. Zhou Zhu and George Church were supported by the Department of Energy and the Lipper Foundation.

Author information

Authors and Affiliations

  1. Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, 98195-2350, Washington, USA
    Martin Tompa, Nan Li & William Stafford Noble
  2. Department of Genome Sciences, Box 357730, University of Washington, Seattle, 98195-7730, Washington, USA
    Martin Tompa & William Stafford Noble
  3. Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
    Timothy L Bailey
  4. Department of Genetics and Lipper Center for Computational Genetics, Harvard Medical School, Boston, 02115, Massachusetts, USA
    George M Church & Zhou Zhu
  5. ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven, B-3001, Belgium
    Bart De Moor & Gert Thijs
  6. Department of Computer Science and Engineering, University of California, San Diego, 92093, California, USA
    Eleazar Eskin
  7. State Scientific Centre 'GosNIIGenetica,' 1st Dorozhny pr. 1, Moscow, 117545, Russia
    Alexander V Favorov, Vsevolod J Makeev & Andrei A Mironov
  8. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilova 32, Moscow, 119991, Russia
    Alexander V Favorov & Vsevolod J Makeev
  9. Bioinformatics Program, Boston University, Boston, 02215, Massachusetts, USA
    Martin C Frith, Yutao Fu & Zhiping Weng
  10. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, 95064, California, USA
    W James Kent
  11. Department of Bioengineering and Bioinformatics, Moscow State University, Lab. Bldg B, Vorobiovy Gory 1-33, Moscow, 119992, Russia
    Andrei A Mironov
  12. Department of Computer Science and Communication (D.I.Co), University of Milan, Milan, Italy
    Giulio Pavesi
  13. Department of Biomolecular Science and Biotechnology, University of Milan, Milan, Italy
    Graziano Pesole
  14. INRIA Rocquencourt, Domaine de Voluceau B.P. 105, Le Chesnay, 78153, France
    Mireille Régnier & Mathias Vandenbogaert
  15. SCMB-Université Libre de Bruxelles, Campus Plaine, CP 263, Boulevard du Triomphe, Bruxelles, 1050, Belgium
    Nicolas Simonis & Jacques van Helden
  16. Center for Studies in Physics and Biology, The Rockefeller University, New York, 10021, New York, USA
    Saurabh Sinha
  17. Department of Bioengineering, University of California, San Diego, 92093, California, USA
    Christopher Workman
  18. Bioinformatics Program, University of California, San Diego, 92093, California, USA
    Chun Ye

Authors

  1. Martin Tompa
    You can also search for this author inPubMed Google Scholar
  2. Nan Li
    You can also search for this author inPubMed Google Scholar
  3. Timothy L Bailey
    You can also search for this author inPubMed Google Scholar
  4. George M Church
    You can also search for this author inPubMed Google Scholar
  5. Bart De Moor
    You can also search for this author inPubMed Google Scholar
  6. Eleazar Eskin
    You can also search for this author inPubMed Google Scholar
  7. Alexander V Favorov
    You can also search for this author inPubMed Google Scholar
  8. Martin C Frith
    You can also search for this author inPubMed Google Scholar
  9. Yutao Fu
    You can also search for this author inPubMed Google Scholar
  10. W James Kent
    You can also search for this author inPubMed Google Scholar
  11. Vsevolod J Makeev
    You can also search for this author inPubMed Google Scholar
  12. Andrei A Mironov
    You can also search for this author inPubMed Google Scholar
  13. William Stafford Noble
    You can also search for this author inPubMed Google Scholar
  14. Giulio Pavesi
    You can also search for this author inPubMed Google Scholar
  15. Graziano Pesole
    You can also search for this author inPubMed Google Scholar
  16. Mireille Régnier
    You can also search for this author inPubMed Google Scholar
  17. Nicolas Simonis
    You can also search for this author inPubMed Google Scholar
  18. Saurabh Sinha
    You can also search for this author inPubMed Google Scholar
  19. Gert Thijs
    You can also search for this author inPubMed Google Scholar
  20. Jacques van Helden
    You can also search for this author inPubMed Google Scholar
  21. Mathias Vandenbogaert
    You can also search for this author inPubMed Google Scholar
  22. Zhiping Weng
    You can also search for this author inPubMed Google Scholar
  23. Christopher Workman
    You can also search for this author inPubMed Google Scholar
  24. Chun Ye
    You can also search for this author inPubMed Google Scholar
  25. Zhou Zhu
    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toMartin Tompa.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

About this article

Cite this article

Tompa, M., Li, N., Bailey, T. et al. Assessing computational tools for the discovery of transcription factor binding sites.Nat Biotechnol 23, 137–144 (2005). https://doi.org/10.1038/nbt1053

Download citation