An Eulerian path approach to global multiple alignment for DNA sequences - PubMed (original) (raw)
An Eulerian path approach to global multiple alignment for DNA sequences
Yu Zhang et al. J Comput Biol. 2003.
Abstract
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one configuration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance.
Similar articles
- An Eulerian path approach to local multiple alignment for DNA sequences.
Zhang Y, Waterman MS. Zhang Y, et al. Proc Natl Acad Sci U S A. 2005 Feb 1;102(5):1285-90. doi: 10.1073/pnas.0409240102. Epub 2005 Jan 24. Proc Natl Acad Sci U S A. 2005. PMID: 15668398 Free PMC article. - Glocal alignment: finding rearrangements during alignment.
Brudno M, Malde S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S. Brudno M, et al. Bioinformatics. 2003;19 Suppl 1:i54-62. doi: 10.1093/bioinformatics/btg1005. Bioinformatics. 2003. PMID: 12855437 - DNA sequence assembly and multiple sequence alignment by an Eulerian path approach.
Zhang Y, Waterman MS. Zhang Y, et al. Cold Spring Harb Symp Quant Biol. 2003;68:205-12. doi: 10.1101/sqb.2003.68.205. Cold Spring Harb Symp Quant Biol. 2003. PMID: 15338619 No abstract available. - Cross-species sequence comparisons: a review of methods and available resources.
Frazer KA, Elnitski L, Church DM, Dubchak I, Hardison RC. Frazer KA, et al. Genome Res. 2003 Jan;13(1):1-12. doi: 10.1101/gr.222003. Genome Res. 2003. PMID: 12529301 Free PMC article. Review. - Repetitive DNA and next-generation sequencing: computational challenges and solutions.
Treangen TJ, Salzberg SL. Treangen TJ, et al. Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117. Nat Rev Genet. 2011. PMID: 22124482 Free PMC article. Review.
Cited by
- De novo repeat classification and fragment assembly.
Pevzner PA, Tang H, Tesler G. Pevzner PA, et al. Genome Res. 2004 Sep;14(9):1786-96. doi: 10.1101/gr.2395204. Genome Res. 2004. PMID: 15342561 Free PMC article. - A novel method for multiple alignment of sequences with repeated and shuffled elements.
Raphael B, Zhi D, Tang H, Pevzner P. Raphael B, et al. Genome Res. 2004 Nov;14(11):2336-46. doi: 10.1101/gr.2657504. Genome Res. 2004. PMID: 15520295 Free PMC article. - Plant Reactome: a knowledgebase and resource for comparative pathway analysis.
Naithani S, Gupta P, Preece J, D'Eustachio P, Elser JL, Garg P, Dikeman DA, Kiff J, Cook J, Olson A, Wei S, Tello-Ruiz MK, Mundo AF, Munoz-Pomer A, Mohammed S, Cheng T, Bolton E, Papatheodorou I, Stein L, Ware D, Jaiswal P. Naithani S, et al. Nucleic Acids Res. 2020 Jan 8;48(D1):D1093-D1103. doi: 10.1093/nar/gkz996. Nucleic Acids Res. 2020. PMID: 31680153 Free PMC article. - An Eulerian path approach to local multiple alignment for DNA sequences.
Zhang Y, Waterman MS. Zhang Y, et al. Proc Natl Acad Sci U S A. 2005 Feb 1;102(5):1285-90. doi: 10.1073/pnas.0409240102. Epub 2005 Jan 24. Proc Natl Acad Sci U S A. 2005. PMID: 15668398 Free PMC article. - Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs.
Pancoska P, Moravek Z, Moll UM. Pancoska P, et al. Nucleic Acids Res. 2004 Aug 27;32(15):4630-45. doi: 10.1093/nar/gkh802. Print 2004. Nucleic Acids Res. 2004. PMID: 15333695 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials