Slider--maximum use of probability information for alignment of short sequence reads and SNP detection - PubMed (original) (raw)

Slider--maximum use of probability information for alignment of short sequence reads and SNP detection

Nawar Malhis et al. Bioinformatics. 2009.

Abstract

Motivation: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files.

Results: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality.

PubMed Disclaimer

Figures

Fig. 1.

Slider scans both the lexographically sorted reference database and the lexicographically sorted Px_Reads (s.ol0) input table once to generate all exact matches. Exact matches are stored in the sorted s.m0 table. In this example, the set of input sequences is 6 bp (_SZ_r = 6), which is aligned to a reference database of 10 bp (_SZ_d = 10) oligos created with a sliding window across the reference. Reads that match are indicated in bold and underlined with an example of a unique match indicated by a solid line and that of a multiple match with a dashed line.

Fig. 2.

Probability that a given base mismatch is a true SNP as a function of the read sequence weight.

Cited by

Technology dictates algorithms: recent developments in read alignment.
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Alser M, et al. Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7. Genome Biol. 2021. PMID: 34446078 Free PMC article. Review.
Review of alignment and SNP calling algorithms for next-generation sequencing data.
Mielczarek M, Szyda J. Mielczarek M, et al. J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9. J Appl Genet. 2016. PMID: 26055432 Review.
Bioinformatics for next generation sequencing data.
Magi A, Benelli M, Gozzini A, Girolami F, Torricelli F, Brandi ML. Magi A, et al. Genes (Basel). 2010 Sep 14;1(2):294-307. doi: 10.3390/genes1020294. Genes (Basel). 2010. PMID: 24710047 Free PMC article.
Benchmarking short sequence mapping tools.
Hatem A, Bozdağ D, Toland AE, Çatalyürek ÜV. Hatem A, et al. BMC Bioinformatics. 2013 Jun 7;14:184. doi: 10.1186/1471-2105-14-184. BMC Bioinformatics. 2013. PMID: 23758764 Free PMC article.
High Performance Multiple Sequence Alignment System for Pyrosequencing Reads from Multiple Reference Genomes.
Saeed F, Perez-Rathke A, Gwarnicki J, Berger-Wolf T, Khokhar A. Saeed F, et al. J Parallel Distrib Comput. 2012 Jan;72(1):83-93. doi: 10.1016/j.jpdc.2011.08.001. Epub 2011 Sep 16. J Parallel Distrib Comput. 2012. PMID: 23125479 Free PMC article.

References

1. Aho AV, Corasick MJ. Efficient string matching: an aid to bibiographic search. Commun. ACM. 1975;18:333–340.
1. Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
1. Brockman W, et al. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008;18:763–770. - PMC - PubMed
1. Delcher AL, et al. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002;30:2478–2483. - PMC - PubMed
1. Eppstein D, et al. Proceedings 1st Symposium Discrete Algorithms ACM and SIAM. San Francisco: 1990. Sparse dynamic programming; pp. 513–522.

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations

Slider--maximum use of probability information for alignment of short sequence reads and SNP detection - PubMed (original) (raw)