Slider--maximum use of probability information for alignment of short sequence reads and SNP detection - PubMed (original) (raw)
Slider--maximum use of probability information for alignment of short sequence reads and SNP detection
Nawar Malhis et al. Bioinformatics. 2009.
Abstract
Motivation: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files.
Results: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality.
Figures
Fig. 1.
Slider scans both the lexographically sorted reference database and the lexicographically sorted Px_Reads (s.ol0) input table once to generate all exact matches. Exact matches are stored in the sorted s.m0 table. In this example, the set of input sequences is 6 bp (_SZ_r = 6), which is aligned to a reference database of 10 bp (_SZ_d = 10) oligos created with a sliding window across the reference. Reads that match are indicated in bold and underlined with an example of a unique match indicated by a solid line and that of a multiple match with a dashed line.
Fig. 2.
Probability that a given base mismatch is a true SNP as a function of the read sequence weight.
Similar articles
- Technology dictates algorithms: recent developments in read alignment.
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Alser M, et al. Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7. Genome Biol. 2021. PMID: 34446078 Free PMC article. Review. - High quality SNP calling using Illumina data at shallow coverage.
Malhis N, Jones SJ. Malhis N, et al. Bioinformatics. 2010 Apr 15;26(8):1029-35. doi: 10.1093/bioinformatics/btq092. Epub 2010 Feb 26. Bioinformatics. 2010. PMID: 20190250 - Coverage-based consensus calling (CbCC) of short sequence reads and comparison of CbCC results to identify SNPs in chickpea (Cicer arietinum; Fabaceae), a crop species without a reference genome.
Azam S, Thakur V, Ruperao P, Shah T, Balaji J, Amindala B, Farmer AD, Studholme DJ, May GD, Edwards D, Jones JD, Varshney RK. Azam S, et al. Am J Bot. 2012 Feb;99(2):186-92. doi: 10.3732/ajb.1100419. Epub 2012 Feb 1. Am J Bot. 2012. PMID: 22301893 - Robust prediction of consensus secondary structures using averaged base pairing probability matrices.
Kiryu H, Kin T, Asai K. Kiryu H, et al. Bioinformatics. 2007 Feb 15;23(4):434-41. doi: 10.1093/bioinformatics/btl636. Epub 2006 Dec 20. Bioinformatics. 2007. PMID: 17182698 - MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.
Liu Y, Schmidt B, Maskell DL. Liu Y, et al. Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23. Bioinformatics. 2010. PMID: 20576627
Cited by
- Technology dictates algorithms: recent developments in read alignment.
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Alser M, et al. Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7. Genome Biol. 2021. PMID: 34446078 Free PMC article. Review. - Review of alignment and SNP calling algorithms for next-generation sequencing data.
Mielczarek M, Szyda J. Mielczarek M, et al. J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9. J Appl Genet. 2016. PMID: 26055432 Review. - Bioinformatics for next generation sequencing data.
Magi A, Benelli M, Gozzini A, Girolami F, Torricelli F, Brandi ML. Magi A, et al. Genes (Basel). 2010 Sep 14;1(2):294-307. doi: 10.3390/genes1020294. Genes (Basel). 2010. PMID: 24710047 Free PMC article. - Benchmarking short sequence mapping tools.
Hatem A, Bozdağ D, Toland AE, Çatalyürek ÜV. Hatem A, et al. BMC Bioinformatics. 2013 Jun 7;14:184. doi: 10.1186/1471-2105-14-184. BMC Bioinformatics. 2013. PMID: 23758764 Free PMC article. - High Performance Multiple Sequence Alignment System for Pyrosequencing Reads from Multiple Reference Genomes.
Saeed F, Perez-Rathke A, Gwarnicki J, Berger-Wolf T, Khokhar A. Saeed F, et al. J Parallel Distrib Comput. 2012 Jan;72(1):83-93. doi: 10.1016/j.jpdc.2011.08.001. Epub 2011 Sep 16. J Parallel Distrib Comput. 2012. PMID: 23125479 Free PMC article.
References
- Aho AV, Corasick MJ. Efficient string matching: an aid to bibiographic search. Commun. ACM. 1975;18:333–340.
- Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
- Eppstein D, et al. Proceedings 1st Symposium Discrete Algorithms ACM and SIAM. San Francisco: 1990. Sparse dynamic programming; pp. 513–522.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources