Local alignment of two-base encoded DNA sequence - PubMed (original) (raw)
Local alignment of two-base encoded DNA sequence
Nils Homer et al. BMC Bioinformatics. 2009.
Abstract
Background: DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity.
Results: We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions.
Conclusion: The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data.
Figures
Figure 1
The function Φ. Φ is a function that encodes two bases as a color. Each color is represented by a number ∈ {0, 1, 2, 3}.
Figure 2
Power evaluation for sequences with errors. We assess the power to align sequences with and without two-base encoding in the presence of a per-base or per-color error rate respectively.
Figure 3
Power evaluation for sequences with errors and base substitutions. We assess the power to align sequences with and without two-base encoding in the presence of errors and base substitutions.
Figure 4
Power evaluation for sequences with errors and a contiguous deletion. We assess the power to align sequences with and without two-base encoding in the presence of errors and a contiguous deletion.
Figure 5
Power evaluation for sequences with errors and a contiguous insertion. We assess the power to align sequences with and without two-base encoding in the presence of errors and a contiguous insertion.
Figure 6
The function Γ. Γ is a function that encodes one base and one color as a base.
Similar articles
- Local alignment of generalized k-base encoded DNA sequence.
Homer N, Nelson SF, Merriman B. Homer N, et al. BMC Bioinformatics. 2010 Jun 24;11:347. doi: 10.1186/1471-2105-11-347. BMC Bioinformatics. 2010. PMID: 20576157 Free PMC article. - Glocal alignment: finding rearrangements during alignment.
Brudno M, Malde S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S. Brudno M, et al. Bioinformatics. 2003;19 Suppl 1:i54-62. doi: 10.1093/bioinformatics/btg1005. Bioinformatics. 2003. PMID: 12855437 - The tree alignment problem.
Varón A, Wheeler WC. Varón A, et al. BMC Bioinformatics. 2012 Nov 9;13:293. doi: 10.1186/1471-2105-13-293. BMC Bioinformatics. 2012. PMID: 23140486 Free PMC article. - A survey of sequence alignment algorithms for next-generation sequencing.
Li H, Homer N. Li H, et al. Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11. Brief Bioinform. 2010. PMID: 20460430 Free PMC article. Review. - Homology assessment and molecular sequence alignment.
Phillips AJ. Phillips AJ. J Biomed Inform. 2006 Feb;39(1):18-33. doi: 10.1016/j.jbi.2005.11.005. Epub 2005 Dec 9. J Biomed Inform. 2006. PMID: 16380300 Review.
Cited by
- Transcriptomics of an extended phenotype: parasite manipulation of wasp social behaviour shifts expression of caste-related genes.
Geffre AC, Liu R, Manfredini F, Beani L, Kathirithamby J, Grozinger CM, Toth AL. Geffre AC, et al. Proc Biol Sci. 2017 Apr 12;284(1852):20170029. doi: 10.1098/rspb.2017.0029. Proc Biol Sci. 2017. PMID: 28404777 Free PMC article. - eIF2β is critical for eIF5-mediated GDP-dissociation inhibitor activity and translational control.
Jennings MD, Kershaw CJ, White C, Hoyle D, Richardson JP, Costello JL, Donaldson IJ, Zhou Y, Pavitt GD. Jennings MD, et al. Nucleic Acids Res. 2016 Nov 16;44(20):9698-9709. doi: 10.1093/nar/gkw657. Epub 2016 Jul 25. Nucleic Acids Res. 2016. PMID: 27458202 Free PMC article. - Transcript Abundance of Putative Lipid Phosphate Phosphatases During Development of Trypanosoma brucei in the Tsetse Fly.
Alves e Silva TL, Savage AF, Aksoy S. Alves e Silva TL, et al. Am J Trop Med Hyg. 2016 Apr;94(4):890-3. doi: 10.4269/ajtmh.15-0566. Epub 2016 Feb 8. Am J Trop Med Hyg. 2016. PMID: 26856918 Free PMC article. - Challenges in exome analysis by LifeScope and its alternative computational pipelines.
Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V. Pranckevičiene E, et al. BMC Res Notes. 2015 Sep 7;8:421. doi: 10.1186/s13104-015-1385-4. BMC Res Notes. 2015. PMID: 26346699 Free PMC article. - Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process.
Erzurumluoglu AM, Rodriguez S, Shihab HA, Baird D, Richardson TG, Day IN, Gaunt TR. Erzurumluoglu AM, et al. Biomed Res Int. 2015;2015:923491. doi: 10.1155/2015/923491. Epub 2015 Apr 6. Biomed Res Int. 2015. PMID: 26106619 Free PMC article. Review.
References
- Hamming R. Error Detecting and Error Correcting Codes. Bell System Technical Journal. 1950;29:147–160.
- Levenshtein VI. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady. 1966;10:706–710.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous