Self-identification of protein-coding regions in microbial genomes - PubMed (original) (raw)
Self-identification of protein-coding regions in microbial genomes
S Audic et al. Proc Natl Acad Sci U S A. 1998.
Abstract
A new method for predicting protein-coding regions in microbial genomic DNA sequences is presented. It uses an ab initio iterative Markov modeling procedure to automatically perform the partition of genomic sequences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current methods, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chem. 17, 123-133], no training set or prior knowledge of the statistical properties of the studied genome are required. This new method tolerates error rates of 1-2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequence data from uncharacterized microorganisms. The method was validated on 10 complete bacterial genomes (from four major phylogenetic lineages). The results show that protein-coding regions can be identified with an accuracy of up to 90% with a totally automated and objective procedure.
Figures
Figure 1
Convergence of the iterative homogeneous Markov modeling. The numbers of nucleotides correctly assigned as “coding” or “reverse coding” are plotted to follow the convergence of the iterative procedure. (A) Influence of the Markov chain order. (B) Influence of the window size. (C) Influence of the simulated error rate. (D) Specificity of the recognition of coding (+) and reverse coding (o) segments for 10 genomes of different G+C content. Mj, M. jannaschii; Mg, M. genitalium; Mp, M. pneumoniae; Hi, H. influenzae; Hp, H. pylori; Bs, B. subtilis; Mt, M. thermoautotrophicum; Syn, Synechocystis sp.; Af, A. fulgidus; Ec, E. coli. The discrepancies between the recognition of coding and reverse-coding regions in the Mg and Mp genomes indicate an actual strand asymmetry.
Similar articles
- How to interpret an anonymous bacterial genome: machine learning approach to gene identification.
Hayes WS, Borodovsky M. Hayes WS, et al. Genome Res. 1998 Nov;8(11):1154-71. doi: 10.1101/gr.8.11.1154. Genome Res. 1998. PMID: 9847079 - Prokaryotic gene prediction using GeneMark and GeneMark.hmm.
Borodovsky M, Mills R, Besemer J, Lomsadze A. Borodovsky M, et al. Curr Protoc Bioinformatics. 2003 May;Chapter 4:Unit4.5. doi: 10.1002/0471250953.bi0405s01. Curr Protoc Bioinformatics. 2003. PMID: 18428700 - Gene identification in novel eukaryotic genomes by self-training algorithm.
Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Lomsadze A, et al. Nucleic Acids Res. 2005 Nov 28;33(20):6494-506. doi: 10.1093/nar/gki937. Print 2005. Nucleic Acids Res. 2005. PMID: 16314312 Free PMC article. - [Gene identification in prokaryotic genomes using hidden Markov model].
Yada T. Yada T. Tanpakushitsu Kakusan Koso. 1997 Dec;42(17 Suppl):2993-3000. Tanpakushitsu Kakusan Koso. 1997. PMID: 9455224 Review. Japanese. No abstract available. - Comparing genomes in terms of protein structure: surveys of a finite parts list.
Gerstein M, Hegyi H. Gerstein M, et al. FEMS Microbiol Rev. 1998 Oct;22(4):277-304. doi: 10.1111/j.1574-6976.1998.tb00371.x. FEMS Microbiol Rev. 1998. PMID: 10357579 Review.
Cited by
- DNA-energetics-based analyses suggest additional genes in prokaryotes.
Khandelwal G, Gupta J, Jayaram B. Khandelwal G, et al. J Biosci. 2012 Jul;37(3):433-44. doi: 10.1007/s12038-012-9221-7. J Biosci. 2012. PMID: 22750981 - Dictionary-driven prokaryotic gene finding.
Shibuya T, Rigoutsos I. Shibuya T, et al. Nucleic Acids Res. 2002 Jun 15;30(12):2710-25. doi: 10.1093/nar/gkf338. Nucleic Acids Res. 2002. PMID: 12060689 Free PMC article. - Tropheryma whipplei Twist: a human pathogenic Actinobacteria with a reduced genome.
Raoult D, Ogata H, Audic S, Robert C, Suhre K, Drancourt M, Claverie JM. Raoult D, et al. Genome Res. 2003 Aug;13(8):1800-9. doi: 10.1101/gr.1474603. Genome Res. 2003. PMID: 12902375 Free PMC article. - Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER.
Aggarwal G, Ramaswamy R. Aggarwal G, et al. J Biosci. 2002 Feb;27(1 Suppl 1):7-14. doi: 10.1007/BF02703679. J Biosci. 2002. PMID: 11927773 - MetaGene: prokaryotic gene finding from environmental genome shotgun sequences.
Noguchi H, Park J, Takagi T. Noguchi H, et al. Nucleic Acids Res. 2006;34(19):5623-30. doi: 10.1093/nar/gkl723. Epub 2006 Oct 5. Nucleic Acids Res. 2006. PMID: 17028096 Free PMC article.
References
- Fleischmann R D, Adams M D, White O, Clayton R A, Kirkness E F, Kerlavage A R, Bult C J, Tomb J F, Dougherty B A, Merrick J M, et al. Science. 1995;269:496–512. - PubMed
- Fraser C M, Gocayne J D, White O, Adams M D, Clayton R A, Fleischmann R D, Bult C J, Kerlavage A R, Sutton G, Kelley J M, et al. Science. 1995;270:397–403. - PubMed
- Bult C J, White O, Olsen G J, Zhou L, Fleischmann R D, Sutton G G, Blake J A, FitzGerald L M, Clayton R A, Gocayne J D, et al. Science. 1996;273:1058–1073. - PubMed
- Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, et al. DNA Res. 1996;3:109–136. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources