Self-identification of protein-coding regions in microbial genomes - PubMed (original) (raw)
Self-identification of protein-coding regions in microbial genomes
S Audic et al. Proc Natl Acad Sci U S A. 1998.
Abstract
A new method for predicting protein-coding regions in microbial genomic DNA sequences is presented. It uses an ab initio iterative Markov modeling procedure to automatically perform the partition of genomic sequences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current methods, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chem. 17, 123-133], no training set or prior knowledge of the statistical properties of the studied genome are required. This new method tolerates error rates of 1-2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequence data from uncharacterized microorganisms. The method was validated on 10 complete bacterial genomes (from four major phylogenetic lineages). The results show that protein-coding regions can be identified with an accuracy of up to 90% with a totally automated and objective procedure.
Figures
Figure 1
Convergence of the iterative homogeneous Markov modeling. The numbers of nucleotides correctly assigned as “coding” or “reverse coding” are plotted to follow the convergence of the iterative procedure. (A) Influence of the Markov chain order. (B) Influence of the window size. (C) Influence of the simulated error rate. (D) Specificity of the recognition of coding (+) and reverse coding (o) segments for 10 genomes of different G+C content. Mj, M. jannaschii; Mg, M. genitalium; Mp, M. pneumoniae; Hi, H. influenzae; Hp, H. pylori; Bs, B. subtilis; Mt, M. thermoautotrophicum; Syn, Synechocystis sp.; Af, A. fulgidus; Ec, E. coli. The discrepancies between the recognition of coding and reverse-coding regions in the Mg and Mp genomes indicate an actual strand asymmetry.
Similar articles
- How to interpret an anonymous bacterial genome: machine learning approach to gene identification.
Hayes WS, Borodovsky M. Hayes WS, et al. Genome Res. 1998 Nov;8(11):1154-71. doi: 10.1101/gr.8.11.1154. Genome Res. 1998. PMID: 9847079 - Prokaryotic gene prediction using GeneMark and GeneMark.hmm.
Borodovsky M, Mills R, Besemer J, Lomsadze A. Borodovsky M, et al. Curr Protoc Bioinformatics. 2003 May;Chapter 4:Unit4.5. doi: 10.1002/0471250953.bi0405s01. Curr Protoc Bioinformatics. 2003. PMID: 18428700 - Gene identification in novel eukaryotic genomes by self-training algorithm.
Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Lomsadze A, et al. Nucleic Acids Res. 2005 Nov 28;33(20):6494-506. doi: 10.1093/nar/gki937. Print 2005. Nucleic Acids Res. 2005. PMID: 16314312 Free PMC article. - [Gene identification in prokaryotic genomes using hidden Markov model].
Yada T. Yada T. Tanpakushitsu Kakusan Koso. 1997 Dec;42(17 Suppl):2993-3000. Tanpakushitsu Kakusan Koso. 1997. PMID: 9455224 Review. Japanese. No abstract available. - Comparing genomes in terms of protein structure: surveys of a finite parts list.
Gerstein M, Hegyi H. Gerstein M, et al. FEMS Microbiol Rev. 1998 Oct;22(4):277-304. doi: 10.1111/j.1574-6976.1998.tb00371.x. FEMS Microbiol Rev. 1998. PMID: 10357579 Review.
Cited by
- MBBC: an efficient approach for metagenomic binning based on clustering.
Wang Y, Hu H, Li X. Wang Y, et al. BMC Bioinformatics. 2015 Feb 5;16:36. doi: 10.1186/s12859-015-0473-8. BMC Bioinformatics. 2015. PMID: 25652152 Free PMC article. - DNA-energetics-based analyses suggest additional genes in prokaryotes.
Khandelwal G, Gupta J, Jayaram B. Khandelwal G, et al. J Biosci. 2012 Jul;37(3):433-44. doi: 10.1007/s12038-012-9221-7. J Biosci. 2012. PMID: 22750981 - Classifying coding DNA with nucleotide statistics.
Carels N, Frías D. Carels N, et al. Bioinform Biol Insights. 2009 Oct 28;3:141-54. doi: 10.4137/bbi.s3030. Bioinform Biol Insights. 2009. PMID: 20140062 Free PMC article. - MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes.
Noguchi H, Taniguchi T, Itoh T. Noguchi H, et al. DNA Res. 2008 Dec;15(6):387-96. doi: 10.1093/dnares/dsn027. Epub 2008 Oct 21. DNA Res. 2008. PMID: 18940874 Free PMC article. - The genome of Borrelia recurrentis, the agent of deadly louse-borne relapsing fever, is a degraded subset of tick-borne Borrelia duttonii.
Lescot M, Audic S, Robert C, Nguyen TT, Blanc G, Cutler SJ, Wincker P, Couloux A, Claverie JM, Raoult D, Drancourt M. Lescot M, et al. PLoS Genet. 2008 Sep 12;4(9):e1000185. doi: 10.1371/journal.pgen.1000185. PLoS Genet. 2008. PMID: 18787695 Free PMC article.
References
- Fleischmann R D, Adams M D, White O, Clayton R A, Kirkness E F, Kerlavage A R, Bult C J, Tomb J F, Dougherty B A, Merrick J M, et al. Science. 1995;269:496–512. - PubMed
- Fraser C M, Gocayne J D, White O, Adams M D, Clayton R A, Fleischmann R D, Bult C J, Kerlavage A R, Sutton G, Kelley J M, et al. Science. 1995;270:397–403. - PubMed
- Bult C J, White O, Olsen G J, Zhou L, Fleischmann R D, Sutton G G, Blake J A, FitzGerald L M, Clayton R A, Gocayne J D, et al. Science. 1996;273:1058–1073. - PubMed
- Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, et al. DNA Res. 1996;3:109–136. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources