Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome (original) (raw)
- Letter
- Published: March 2001
- Mark Schroeder1 na1,
- Ursula Pieper1,2,
- Alexander Sczyrba1,
- Gulriz Aytekin-Kurban1,
- Stefan Bekiranov1,
- J. Eduardo Fajardo1,
- Narayanan Eswar2,
- Roberto Sanchez2,
- Andrej Sali2 &
- …
- Terry Gaasterland1
Nature Genetics volume 27, pages 337–340 (2001)Cite this article
- 182 Accesses
- 47 Citations
- 3 Altmetric
- Metrics details
Abstract
The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes1. This annotation strategy is applicable to genomes of all organisms, including human.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Similar content being viewed by others
References
- Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Article Google Scholar - Rubin, G.M. et al. A Drosophila complementary DNA resource. Science 287, 2222–2224 (2000).
Article CAS Google Scholar - Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS Google Scholar - Burge, C.B. & Karlin, S. Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. 8, 346–354 (1998).
Article CAS Google Scholar - Reese, M.G. et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000).
Article CAS Google Scholar - Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E. Gene discovery in dbEST. Science 265, 1993–1994 (1994).
Article CAS Google Scholar - Gaasterland, T. & Ragan, M.A. Constructing multigenome views of whole microbial genomes. Microb. Comp. Genomics 3, 177–192 (1998).
Article CAS Google Scholar - Benson, D.A. et al. GenBank. Nucleic Acids Res. 27, 12–17 (1999).
Article CAS Google Scholar - Bhat, T.N. et al. The PDB data uniformity project. Nucleic Acids Res. 29, 214–218 (2001).
Article CAS Google Scholar - Deckert, G. et al. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392, 353–358 (1998).
Article CAS Google Scholar - Gaasterland, T. et al. MAGPIE/EGRET annotation of the 2.9-Mb Drosophila melanogaster Adh region. Genome Res. 10, 502–510 (2000).
Article CAS Google Scholar - Sánchez, R. & Sali, A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc. Natl. Acad. Sci. USA 95, 13597–13602 (1998).
Article Google Scholar - Sánchez, R. & Sali, A. ModBase: a database of comparative protein structure models. Bioinformatics 15, 1060–1061 (1999).
Article Google Scholar - Sánchez, R. & Sali, A. Evaluation of comparative protein structure modeling by MODELLER -3. Proteins Suppl. 1, 50–58 (1997).
- Martí-Renom, M.A. et al. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000).
Article Google Scholar - Reese, M.G., Kulp, D., Tammana, H. & Haussler, D. Genie—gene finding in Drosophila melanogaster. Genome Res. 10, 529–538 (2000).
Article CAS Google Scholar - Strausberg, R.L., Feingold, E.A., Klausner, R.D. & Collins, F.S. The mammalian gene collection. Science 286, 455–457 (1999).
Article CAS Google Scholar - Reboul, J. et al. Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans. Nature Genet. 27, 332–336 (2001).
Article CAS Google Scholar - Burley, S.K. et al. Structural genomics: beyond the human genome project. Nature Genet. 23, 151–157 (1999).
Article CAS Google Scholar - Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS Google Scholar - Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522 (2000).
Article CAS Google Scholar - Henikoff, J., Henikoff, S. & Pietrokovski, S. New features of the Blocks Database servers. Nucleic Acids Res. 27, 226–228 (1999).
Article CAS Google Scholar - Hofmann, K., Bucher, P., Falquet, L. & Bairoch, A. The PROSITE database, its status in 1999. Nucleic Acids Res. 27, 215–219 (1999).
Article CAS Google Scholar - Altschul, S.F. & Koonin, E.V. Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci. 23, 444–447 (1998).
Article CAS Google Scholar - Sali, A. & Blundell, T.L. Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993).
Article CAS Google Scholar - Bateman, A. et al. Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res. 27, 260–262 (1999).
Article CAS Google Scholar
Acknowledgements
We thank S. Burley, M. Vidal, J. Sorge, J. Goncalves, M. Ashburner, S. Lewis, M. Young and U. Gaul for insights and comments. This work was partially supported by the Mathers, Sinsheimer and Mallinkrodt Foundations, National Cancer Institute Health grant R33CA84699, National Institutes of Health grant P50GM62529, and the National Science Foundation grant DBI-9984882.
Author information
Author notes
- Shuba Gopal and Mark Schroeder: These authors contributed equally to this work.
Authors and Affiliations
- Laboratories of Computational Genomics, The Rockefeller University, New York, New York, USA
Shuba Gopal, Mark Schroeder, Ursula Pieper, Alexander Sczyrba, Gulriz Aytekin-Kurban, Stefan Bekiranov, J. Eduardo Fajardo & Terry Gaasterland - Biophysics, The Rockefeller University, New York, New York, USA
Ursula Pieper, Narayanan Eswar, Roberto Sanchez & Andrej Sali
Authors
- Shuba Gopal
You can also search for this author inPubMed Google Scholar - Mark Schroeder
You can also search for this author inPubMed Google Scholar - Ursula Pieper
You can also search for this author inPubMed Google Scholar - Alexander Sczyrba
You can also search for this author inPubMed Google Scholar - Gulriz Aytekin-Kurban
You can also search for this author inPubMed Google Scholar - Stefan Bekiranov
You can also search for this author inPubMed Google Scholar - J. Eduardo Fajardo
You can also search for this author inPubMed Google Scholar - Narayanan Eswar
You can also search for this author inPubMed Google Scholar - Roberto Sanchez
You can also search for this author inPubMed Google Scholar - Andrej Sali
You can also search for this author inPubMed Google Scholar - Terry Gaasterland
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toTerry Gaasterland.
Rights and permissions
About this article
Cite this article
Gopal, S., Schroeder, M., Pieper, U. et al. Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome.Nat Genet 27, 337–340 (2001). https://doi.org/10.1038/85922
- Received: 22 December 2000
- Accepted: 07 February 2001
- Issue Date: March 2001
- DOI: https://doi.org/10.1038/85922