GeneID in Drosophila - PubMed (original) (raw)
GeneID in Drosophila
G Parra et al. Genome Res. 2000 Apr.
Abstract
GeneID is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, and start and stop codons are predicted and scored along the sequence using position weight matrices (PWMs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the log-likelihood ratio of a Markov model for coding DNA. In the last step, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. In this paper we describe the obtention of PWMs for sites, and the Markov model of coding DNA in Drosophila melanogaster. We also compare other models of coding DNA with the Markov model. Finally, we present and discuss the results obtained when GeneID is used to predict genes in the Adh region. These results show that the accuracy of GeneID predictions compares currently with that of other existing tools but that GeneID is likely to be more efficient in terms of speed and memory usage.
Figures
Figure 1
Predictions obtained by
GeneID
in the region 462500–477500 from the Adh sequence, compared with the annotation in the standard std3 set. In a first step,
GeneID
identifies and scores all possible donor (blue) and acceptor (yellow) sites, start codons (green), and stop codons (red) using PWMs—the height of the corresponding spike is proportional to the site score. A total of 4704 sites were generated along this 15,000-bp region by
GeneID
, only the highest scoring ones are displayed here. In a second step,
GeneID
builds all exons compatible with these sites. A total of 11,967 exons were built in this particular region (not displayed). Exons are scored as the sum of the scores of the defining sites, plus the score of their coding potential measured according with a Markov model of order 5. The coding potential is displayed along the DNA sequence (MM_score). Regions strong in red are more likely to be coding than regions strong in blue. From the set of predicted exons, the gene structure is generated, maximizing the sum of the scores of the assembled exons. Exons assembled in the predicted genes are drawn with heights proportional to their scores. A two-color code is used to indicate frame compatibility: Two adjacent exons are frame compatible if the right half of the upstream exon (the remainder) matches the color of the left half of the downstream exon (the frame). Data are from the
gff2ps
program (available at
http://www1.imim.es/∼jabril/GFFTOOLS/GFF2PS.html
). The input
GFF
and the configuration files required for
gff2ps
to generate this diagram can be found at
http://www1.imim.es/∼gparra/GASP1
.
Comment in
- A biologist's view of the Drosophila genome annotation assessment project.
Ashburner M. Ashburner M. Genome Res. 2000 Apr;10(4):391-3. doi: 10.1101/gr.10.4.391. Genome Res. 2000. PMID: 10779478 Review. No abstract available.
Similar articles
- MAGPIE/EGRET annotation of the 2.9-Mb Drosophila melanogaster Adh region.
Gaasterland T, Sczyrba A, Thomas E, Aytekin-Kurban G, Gordon P, Sensen CW. Gaasterland T, et al. Genome Res. 2000 Apr;10(4):502-10. doi: 10.1101/gr.10.4.502. Genome Res. 2000. PMID: 10779489 Free PMC article. - Using database matches with for HMMGene for automated gene detection in Drosophila.
Krogh A. Krogh A. Genome Res. 2000 Apr;10(4):523-8. doi: 10.1101/gr.10.4.523. Genome Res. 2000. PMID: 10779492 Free PMC article. - Ab initio gene finding in Drosophila genomic DNA.
Salamov AA, Solovyev VV. Salamov AA, et al. Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516. Genome Res. 2000. PMID: 10779491 Free PMC article. - Annotation of the Drosophila melanogaster euchromatic genome: a systematic review.
Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE, Smith CD, Tupy JL, Whitfied EJ, Bayraktaroglu L, Berman BP, Bettencourt BR, Celniker SE, de Grey AD, Drysdale RA, Harris NL, Richter J, Russo S, Schroeder AJ, Shu SQ, Stapleton M, Yamada C, Ashburner M, Gelbart WM, Rubin GM, Lewis SE. Misra S, et al. Genome Biol. 2002;3(12):RESEARCH0083. doi: 10.1186/gb-2002-3-12-research0083. Epub 2002 Dec 31. Genome Biol. 2002. PMID: 12537572 Free PMC article. Review. - Tracking adaptive evolutionary events in genomic sequences.
Liberles DA, Wayne ML. Liberles DA, et al. Genome Biol. 2002;3(6):REVIEWS1018. doi: 10.1186/gb-2002-3-6-reviews1018. Epub 2002 May 29. Genome Biol. 2002. PMID: 12093382 Free PMC article. Review.
Cited by
- Scrutinizing the immune defence inventory of Camponotus floridanus applying total transcriptome sequencing.
Gupta SK, Kupper M, Ratzka C, Feldhaar H, Vilcinskas A, Gross R, Dandekar T, Förster F. Gupta SK, et al. BMC Genomics. 2015 Jul 22;16(1):540. doi: 10.1186/s12864-015-1748-1. BMC Genomics. 2015. PMID: 26198742 Free PMC article. - EfGD: the Erianthus fulvus genome database.
Qian Z, Li X, He L, Gu S, Shen Q, Rao X, Zhang R, Di Y, Xie L, Wang X, Chen S, Dong Y, Li F. Qian Z, et al. Database (Oxford). 2022 Aug 31;2022:baac076. doi: 10.1093/database/baac076. Database (Oxford). 2022. PMID: 36043401 Free PMC article. - Genomics-driven discovery of the pneumocandin biosynthetic gene cluster in the fungus Glarea lozoyensis.
Chen L, Yue Q, Zhang X, Xiang M, Wang C, Li S, Che Y, Ortiz-López FJ, Bills GF, Liu X, An Z. Chen L, et al. BMC Genomics. 2013 May 20;14:339. doi: 10.1186/1471-2164-14-339. BMC Genomics. 2013. PMID: 23688303 Free PMC article. - Origin and adaptation to high altitude of Tibetan semi-wild wheat.
Guo W, Xin M, Wang Z, Yao Y, Hu Z, Song W, Yu K, Chen Y, Wang X, Guan P, Appels R, Peng H, Ni Z, Sun Q. Guo W, et al. Nat Commun. 2020 Oct 8;11(1):5085. doi: 10.1038/s41467-020-18738-5. Nat Commun. 2020. PMID: 33033250 Free PMC article. - Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine.
Xiao W, Wu L, Yavas G, Simonyan V, Ning B, Hong H. Xiao W, et al. Pharmaceutics. 2016 Apr 22;8(2):15. doi: 10.3390/pharmaceutics8020015. Pharmaceutics. 2016. PMID: 27110816 Free PMC article. Review.
References
- Borodovsky M, McIninch J. Genmark: Parallel gene recognition for both DNA strands. Comput Chem. 1993;17:123–113.
- Burge CB, Karlin S. Finding the genes in genomic DNA. Curr Opin Struct Biol. 1998;8:346–354. - PubMed
- Claverie JM. Computational methods for the identification of genes in vertebrate genomic sequences. Hum Mol Genet. 1997;6:1735–1744. - PubMed
- Guigó R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol. 1998;5:681–702. - PubMed
- ————— . DNA composition, codon usage and exon prediction. In: Bishop M, editor. Nucleic protein databases. San Diego, CA: Academic Press; 1999. pp. 53–80.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases