Evaluation of gene structure prediction programs - PubMed (original) (raw)
Comparative Study
. 1996 Jun 15;34(3):353-67.
doi: 10.1006/geno.1996.0298.
Affiliations
- PMID: 8786136
- DOI: 10.1006/geno.1996.0298
Comparative Study
Evaluation of gene structure prediction programs
M Burset et al. Genomics. 1996.
Abstract
We evaluate a number of computer programs designed to predict the structure of protein coding genes in genomic DNA sequences. Computational gene identification is set to play an increasingly important role in the development of the genome projects, as emphasis turns from mapping to large-scale sequencing. The evaluation presented here serves both to assess the current status of the problem and to identify the most promising approaches to ensure further progress. The programs analyzed were uniformly tested on a large set of vertebrate sequences with simple gene structure, and several measures of predictive accuracy were computed at the nucleotide, exon, and protein product levels. The results indicated that the predictive accuracy of the programs analyzed was lower than originally found. The accuracy was even lower when considering only those sequences that had recently been entered and that did not show any similarity to previously entered sequences. This indicates that the programs are overly dependent on the particularities of the examples they learn from. For most of the programs, accuracy in this test set ranged from 0.60 to 0.70 as measured by the Correlation Coefficient (where 1.0 corresponds to a perfect prediction and 0.0 is the value expected for a random prediction), and the average percentage of exons exactly identified was less than 50%. Only those programs including protein sequence database searches showed substantially greater accuracy. The accuracy of the programs was severely affected by relatively high rates of sequence errors. Since the set on which the programs were tested included only relatively short sequences with simple gene structure, the accuracy of the programs is likely to be even lower when used for large uncharacterized genomic sequences with complex structure. While in such cases, programs currently available may still be of great use in pinpointing the regions likely to contain exons, they are far from being powerful enough to elucidate its genomic structure completely.
Similar articles
- The Gene-Finder computer tools for analysis of human and model organisms genome sequences.
Solovyev V, Salamov A. Solovyev V, et al. Proc Int Conf Intell Syst Mol Biol. 1997;5:294-302. Proc Int Conf Intell Syst Mol Biol. 1997. PMID: 9322052 - Computational gene identification: an open problem.
Guigó R. Guigó R. Comput Chem. 1997;21(4):215-22. doi: 10.1016/s0097-8485(97)00008-9. Comput Chem. 1997. PMID: 9415986 - Finding genes in DNA with a Hidden Markov Model.
Henderson J, Salzberg S, Fasman KH. Henderson J, et al. J Comput Biol. 1997 Summer;4(2):127-41. doi: 10.1089/cmb.1997.4.127. J Comput Biol. 1997. PMID: 9228612 - An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.
[No authors listed] [No authors listed] Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review. - Computational methods for the identification of genes in vertebrate genomic sequences.
Claverie JM. Claverie JM. Hum Mol Genet. 1997;6(10):1735-44. doi: 10.1093/hmg/6.10.1735. Hum Mol Genet. 1997. PMID: 9300666 Review.
Cited by
- Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD.
Nanni A, Titus-McQuillan J, Bankole KS, Pardo-Palacios F, Signor S, Vlaho S, Moskalenko O, Morse AM, Rogers RL, Conesa A, McIntyre LM. Nanni A, et al. Nucleic Acids Res. 2024 Mar 21;52(5):e28. doi: 10.1093/nar/gkae056. Nucleic Acids Res. 2024. PMID: 38340337 Free PMC article. - Fine-mapping and evolutionary history of R-BPMV, a dominant resistance gene to Bean pod mottle virus in Phaseolus vulgaris L.
Meziadi C, Alvarez-Diaz JC, Thareau V, Gratias A, Marande W, Soler-Garzon A, Miklas PN, Pflieger S, Geffroy V. Meziadi C, et al. Theor Appl Genet. 2023 Dec 13;137(1):8. doi: 10.1007/s00122-023-04513-9. Theor Appl Genet. 2023. PMID: 38092992 - Genome annotation: From human genetics to biodiversity genomics.
Guigó R. Guigó R. Cell Genom. 2023 Aug 1;3(8):100375. doi: 10.1016/j.xgen.2023.100375. eCollection 2023 Aug 9. Cell Genom. 2023. PMID: 37601977 Free PMC article. Review. - Escherichia coli transcriptome assembly from a compendium of RNA-seq data sets.
Tjaden B. Tjaden B. RNA Biol. 2023 Jan;20(1):77-84. doi: 10.1080/15476286.2023.2189331. RNA Biol. 2023. PMID: 36920168 Free PMC article. - Addressing the pervasive scarcity of structural annotation in eukaryotic algae.
Kwon T, Hanschen ER, Hovde BT. Kwon T, et al. Sci Rep. 2023 Jan 30;13(1):1687. doi: 10.1038/s41598-023-27881-0. Sci Rep. 2023. PMID: 36717613 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical