Positional characterisation of false positives from computational prediction of human splice sites - PubMed (original) (raw)
Comparative Study
Positional characterisation of false positives from computational prediction of human splice sites
T A Thanaraj. Nucleic Acids Res. 2000.
Abstract
The performance of computational tools that can predict human splice sites are reviewed using a test set of EST-confirmed splice sites. The programs (namely HMMgene, NetGene2, HSPL, NNSPLICE, SpliceView and GeneID-3) differ from one another in the degree of discriminatory information used for prediction. The results indicate that, as expected, HMMgene and NetGene2 (which use global as well as local coding information and splice signals) followed by HSPL (which uses local coding information and splice signals) performed better than the other three programs (which use only splice signals). For the former three programs, one in every three false positive splice sites was predicted in the vicinity of true splice sites while only one in every 12 was expected to occur in such a region by chance. The persistence of this observation for programs (namely FEXH, GRAIL2, MZEF, GeneID-3, HMMgene and GENSCAN) that can predict all the potential exons (including optimal and sub-optimal) was assessed. In a high proportion (>50%) of the partially correct predicted exons, the incorrect exon ends were located in the vicinity of the real splice sites. Analysis of the distribution of proximal false positives indicated that the splice signals used by the algorithms are not strong enough to discriminate particularly those false predictions that occur within +/- 25 nt around the real sites. It is therefore suggested that specialised statistics that can discriminate real splice sites from proximal false positives be incorporated in gene prediction programs.
Figures
Figure 1
(a) Performance of donor site prediction programs in terms of specificity and sensitivity (shown by solid lines). Also shown are percentage of false positive donor sites that are proximal (shown by dashed lines). (b) Performance of acceptor site prediction programs in terms of specificity and sensitivity (shown by solid lines). Also shown are percentage of false positive acceptor sites that are proximal (shown by dashed lines).
Figure 1
(a) Performance of donor site prediction programs in terms of specificity and sensitivity (shown by solid lines). Also shown are percentage of false positive donor sites that are proximal (shown by dashed lines). (b) Performance of acceptor site prediction programs in terms of specificity and sensitivity (shown by solid lines). Also shown are percentage of false positive acceptor sites that are proximal (shown by dashed lines).
Figure 2
(a) Proportion of false donor sites with a score ≥ that of real donor sites. (b) Proportion of false acceptor sites with a score ≥ that of real acceptor sites.
Figure 2
(a) Proportion of false donor sites with a score ≥ that of real donor sites. (b) Proportion of false acceptor sites with a score ≥ that of real acceptor sites.
Figure 3
(a) Performance of donor site prediction programs in terms of corrected specificity and sensitivity (shown by solid lines). Also shown are corrected percentage of false positive donor sites that are proximal (shown by dashed lines). Only those false positive donor sites with a score ≥ that of real donor sites were considered. (b) Performance of acceptor site prediction programs in terms of corrected specificity and sensitivity (shown by solid lines). Also shown are the corrected percentage of false positive acceptor sites that are proximal (shown by dashed lines). Only those false positive donor sites with a score ≥ that of real acceptor sites were considered.
Figure 3
(a) Performance of donor site prediction programs in terms of corrected specificity and sensitivity (shown by solid lines). Also shown are corrected percentage of false positive donor sites that are proximal (shown by dashed lines). Only those false positive donor sites with a score ≥ that of real donor sites were considered. (b) Performance of acceptor site prediction programs in terms of corrected specificity and sensitivity (shown by solid lines). Also shown are the corrected percentage of false positive acceptor sites that are proximal (shown by dashed lines). Only those false positive donor sites with a score ≥ that of real acceptor sites were considered.
Similar articles
- Prediction of exact boundaries of exons.
Thanaraj TA, Robinson AJ. Thanaraj TA, et al. Brief Bioinform. 2000 Nov;1(4):343-56. doi: 10.1093/bib/1.4.343. Brief Bioinform. 2000. PMID: 11465052 - The prediction of exons through an analysis of spliceable open reading frames.
Hutchinson GB, Hayden MR. Hutchinson GB, et al. Nucleic Acids Res. 1992 Jul 11;20(13):3453-62. doi: 10.1093/nar/20.13.3453. Nucleic Acids Res. 1992. PMID: 1321415 Free PMC article. - Analysis of canonical and non-canonical splice sites in mammalian genomes.
Burset M, Seledtsov IA, Solovyev VV. Burset M, et al. Nucleic Acids Res. 2000 Nov 1;28(21):4364-75. doi: 10.1093/nar/28.21.4364. Nucleic Acids Res. 2000. PMID: 11058137 Free PMC article. - Exonization of transposed elements: A challenge and opportunity for evolution.
Schmitz J, Brosius J. Schmitz J, et al. Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review. - Using MZEF to find internal coding exons.
Zhang MQ. Zhang MQ. Curr Protoc Bioinformatics. 2002 Aug;Chapter 4:Unit 4.2. doi: 10.1002/0471250953.bi0402s00. Curr Protoc Bioinformatics. 2002. PMID: 18792940 Review.
Cited by
- Discovery of Novel Functional Centers With Rationally Designed Amino Acid Motifs.
Wong A, Tian X, Gehring C, Marondedze C. Wong A, et al. Comput Struct Biotechnol J. 2018 Feb 27;16:70-76. doi: 10.1016/j.csbj.2018.02.007. eCollection 2018. Comput Struct Biotechnol J. 2018. PMID: 29977479 Free PMC article. Review. - Complex inheritance in Pulmonary Arterial Hypertension patients with several mutations.
Pousada G, Baloira A, Valverde D. Pousada G, et al. Sci Rep. 2016 Sep 15;6:33570. doi: 10.1038/srep33570. Sci Rep. 2016. PMID: 27630060 Free PMC article. - TrueSight: a new algorithm for splice junction detection using RNA-seq.
Li Y, Li-Byarlay H, Burns P, Borodovsky M, Robinson GE, Ma J. Li Y, et al. Nucleic Acids Res. 2013 Feb 1;41(4):e51. doi: 10.1093/nar/gks1311. Epub 2012 Dec 18. Nucleic Acids Res. 2013. PMID: 23254332 Free PMC article. - Aberrant 5' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization.
Buratti E, Chivers M, Královicová J, Romano M, Baralle M, Krainer AR, Vorechovsky I. Buratti E, et al. Nucleic Acids Res. 2007;35(13):4250-63. doi: 10.1093/nar/gkm402. Epub 2007 Jun 18. Nucleic Acids Res. 2007. PMID: 17576681 Free PMC article. - Aberrant 3' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization.
Vorechovský I. Vorechovský I. Nucleic Acids Res. 2006;34(16):4630-41. doi: 10.1093/nar/gkl535. Epub 2006 Sep 8. Nucleic Acids Res. 2006. PMID: 16963498 Free PMC article.
References
- Krogh A. (1997) In Gaasterland,T. et al. (eds), Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Cambridge, UK, pp. 179–186.
- Krogh A. (1998) http://www.cbs.dtu.dk/services/HMMgene/
- Guigo R., Knudsen,S., Drake,N. and Smith,T. (1992) J. Mol. Biol., 226, 141–157. - PubMed
- Burset M., Abrill,J.F. and Guigo,R. (1998) http://apolo.imim.es/geneid.html
- Solovyev V.V., Salamov,A.A. and Lawrence,C.B. (1995) In Rawling,C., Clark,D., Altman,R., Hunter,L., Lengauer,T. and Wodak,S. (eds), Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Cambridge, UK, pp. 367–375.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials