The fragment assembly string graph - PubMed (original) (raw)
The fragment assembly string graph
Eugene W Myers. Bioinformatics. 2005.
Abstract
We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.
Similar articles
- Consensus generation and variant detection by Celera Assembler.
Denisov G, Walenz B, Halpern AL, Miller J, Axelrod N, Levy S, Sutton G. Denisov G, et al. Bioinformatics. 2008 Apr 15;24(8):1035-40. doi: 10.1093/bioinformatics/btn074. Epub 2008 Mar 4. Bioinformatics. 2008. PMID: 18321888 - Optimal spliced alignments of short sequence reads.
De Bona F, Ossowski S, Schneeberger K, Rätsch G. De Bona F, et al. Bioinformatics. 2008 Aug 15;24(16):i174-80. doi: 10.1093/bioinformatics/btn300. Bioinformatics. 2008. PMID: 18689821 - Whole genome assembly from 454 sequencing output via modified DNA graph concept.
Blazewicz J, Bryja M, Figlerowicz M, Gawron P, Kasprzak M, Kirton E, Platt D, Przybytek J, Swiercz A, Szajkowski L. Blazewicz J, et al. Comput Biol Chem. 2009 Jun;33(3):224-30. doi: 10.1016/j.compbiolchem.2009.04.005. Epub 2009 May 3. Comput Biol Chem. 2009. PMID: 19477687 - De novo sequencing of plant genomes using second-generation technologies.
Imelfort M, Edwards D. Imelfort M, et al. Brief Bioinform. 2009 Nov;10(6):609-18. doi: 10.1093/bib/bbp039. Brief Bioinform. 2009. PMID: 19933209 Review. - Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph.
Li Z, Chen Y, Mu D, Yuan J, Shi Y, Zhang H, Gan J, Li N, Hu X, Liu B, Yang B, Fan W. Li Z, et al. Brief Funct Genomics. 2012 Jan;11(1):25-37. doi: 10.1093/bfgp/elr035. Epub 2011 Dec 19. Brief Funct Genomics. 2012. PMID: 22184334 Review.
Cited by
- A hybrid approach for the automated finishing of bacterial genomes.
Bashir A, Klammer A, Robins WP, Chin CS, Webster D, Paxinos E, Hsu D, Ashby M, Wang S, Peluso P, Sebra R, Sorenson J, Bullard J, Yen J, Valdovino M, Mollova E, Luong K, Lin S, LaMay B, Joshi A, Rowe L, Frace M, Tarr CL, Turnsek M, Davis BM, Kasarskis A, Mekalanos JJ, Waldor MK, Schadt EE. Bashir A, et al. Nat Biotechnol. 2012 Jul 1;30(7):701-707. doi: 10.1038/nbt.2288. Nat Biotechnol. 2012. PMID: 22750883 Free PMC article. - DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.
Ye C, Hill CM, Wu S, Ruan J, Ma ZS. Ye C, et al. Sci Rep. 2016 Aug 30;6:31900. doi: 10.1038/srep31900. Sci Rep. 2016. PMID: 27573208 Free PMC article. - Inferring the global structure of chromosomes from structural variations.
Yasuda T, Miyano S. Yasuda T, et al. BMC Genomics. 2015;16 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-16-S2-S13. Epub 2015 Jan 21. BMC Genomics. 2015. PMID: 25707904 Free PMC article. - Read mapping on de Bruijn graphs.
Limasset A, Cazaux B, Rivals E, Peterlongo P. Limasset A, et al. BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9. BMC Bioinformatics. 2016. PMID: 27306641 Free PMC article. - Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler.
Zerbino DR, McEwen GK, Margulies EH, Birney E. Zerbino DR, et al. PLoS One. 2009 Dec 22;4(12):e8407. doi: 10.1371/journal.pone.0008407. PLoS One. 2009. PMID: 20027311 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources