naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing - PubMed (original) (raw)
naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing
Wei-Chun Kao et al. J Comput Biol. 2011 Mar.
Abstract
Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base-calling algorithm that is fully parametric and has several advantages over previously proposed methods. Our original algorithm, called BayesCall, significantly reduced the error rate, particularly in the later cycles of a sequencing run, and also produced useful base-specific quality scores with a high discrimination ability. Unfortunately, however, BayesCall is too computationally expensive to be of broad practical use. In this article, we build on our previous model-based approach to devise an efficient base-calling algorithm that is orders of magnitude faster than BayesCall, while still maintaining a comparably high level of accuracy. Our new algorithm is called naive-BayesCall, and it utilizes approximation and optimization methods to achieve scalability. We describe the performance of naiveBayesCall and demonstrate how improved base-calling accuracy may facilitate de novo assembly and SNP detection when the sequence coverage depth is low to moderate.
Similar articles
- OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.
Das S, Vikalo H. Das S, et al. Bioinformatics. 2012 Jul 1;28(13):1677-83. doi: 10.1093/bioinformatics/bts256. Epub 2012 May 7. Bioinformatics. 2012. PMID: 22569177 Free PMC article. - BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing.
Kao WC, Stevens K, Song YS. Kao WC, et al. Genome Res. 2009 Oct;19(10):1884-95. doi: 10.1101/gr.095299.109. Epub 2009 Aug 6. Genome Res. 2009. PMID: 19661376 Free PMC article. - TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.
Menges F, Narzisi G, Mishra B. Menges F, et al. Bioinformatics. 2011 Sep 1;27(17):2330-7. doi: 10.1093/bioinformatics/btr393. Epub 2011 Jun 30. Bioinformatics. 2011. PMID: 21724593 - PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.
Liao P, Satten GA, Hu YJ. Liao P, et al. Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31. Genet Epidemiol. 2017. PMID: 28560825 Free PMC article. - Improvement in detection of minor alleles in next generation sequencing by base quality recalibration.
Ni S, Stoneking M. Ni S, et al. BMC Genomics. 2016 Feb 27;17:139. doi: 10.1186/s12864-016-2463-2. BMC Genomics. 2016. PMID: 26920804 Free PMC article.
Cited by
- Base calling for high-throughput short-read sequencing: dynamic programming solutions.
Das S, Vikalo H. Das S, et al. BMC Bioinformatics. 2013 Apr 15;14:129. doi: 10.1186/1471-2105-14-129. BMC Bioinformatics. 2013. PMID: 23586484 Free PMC article. - OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.
Das S, Vikalo H. Das S, et al. Bioinformatics. 2012 Jul 1;28(13):1677-83. doi: 10.1093/bioinformatics/bts256. Epub 2012 May 7. Bioinformatics. 2012. PMID: 22569177 Free PMC article. - Genotype and SNP calling from next-generation sequencing data.
Nielsen R, Paul JS, Albrechtsen A, Song YS. Nielsen R, et al. Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986. Nat Rev Genet. 2011. PMID: 21587300 Free PMC article. Review. - High-resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions.
Amir A, Zeisel A, Zuk O, Elgart M, Stern S, Shamir O, Turnbaugh PJ, Soen Y, Shental N. Amir A, et al. Nucleic Acids Res. 2013 Dec;41(22):e205. doi: 10.1093/nar/gkt1070. Epub 2013 Nov 7. Nucleic Acids Res. 2013. PMID: 24214960 Free PMC article. - Pan-cancer analysis of systematic batch effects on somatic sequence variations.
Choi JH, Hong SE, Woo HG. Choi JH, et al. BMC Bioinformatics. 2017 Apr 11;18(1):211. doi: 10.1186/s12859-017-1627-7. BMC Bioinformatics. 2017. PMID: 28399795 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources