Whole-genome sequence assembly for mammalian genomes: Arachne 2 - PubMed (original) (raw)
Whole-genome sequence assembly for mammalian genomes: Arachne 2
David B Jaffe et al. Genome Res. 2003 Jan.
Abstract
We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rejoined using several criteria, yielding a 64-fold increase in length (N50), and apparent elimination of all global misjoins; (2) gaps between contigs in supercontigs were filled (partially or completely) by insertion of reads, as suggested by pairing within the supercontig, increasing the N50 contig length by 50%; (3) memory usage was reduced fourfold. The outcome of this mouse assembly and its analysis are described in (Mouse Genome Sequencing Consortium 2002).
Figures
Figure 1.
Joining of supercontigs. Three supercontigs (a, b, c) are seen off the end of supercontig s. There are two or more read pair links from s to each of them. Each has an optimal position relative to s, determined by the insert lengths corresponding to the read pairs. However, each insert length has a standard deviation associated to it, and so the positions of a, b, and c relative to s also have standard deviations. Supposing that we allow each of them to slide from their optimal positions by up to 2.5 standard deviations, but that we do not allow overlap between any of the supercontigs, is there more than one possible order for the supercontigs? Among the possible orders, does a always appear first (after s)? If so, we join supercontig s to supercontig a.
Figure 2.
A disguised instance where sequence join alone holds together a supercontig. A long supercontig (blue) from one part of the genome subsumes a small foreign inset (red) from a completely different part of the genome, held together by a single point of attachment within a contig (bicolor): in fact only a sequence join ties blue to red. This was not recognized in the version of the code which produced the released mouse assembly (Mouse Genome Sequencing Consortium 2002). Resolution: break at the bicolor juncture, move the red sequence to where it links in another supercontig.
Figure 3.
Positive breaking of supercontigs. Three correlated links are seen between supercontigs S1 and S2. The spread of the connection between S1 and S2 is, in this case, the lesser of 10 kb and 25 kb, which is 10 kb. Because the positive breaking algorithm as applied to mouse required five links with spread at least 50 kb, this connection would not have been sufficient to break the supercontigs. If it were, the respective supercontigs would have been broken at the exact ends of reads (green bars).
Similar articles
- ARACHNE: a whole-genome shotgun assembler.
Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES. Batzoglou S, et al. Genome Res. 2002 Jan;12(1):177-89. doi: 10.1101/gr.208902. Genome Res. 2002. PMID: 11779843 Free PMC article. - LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.
Xu GC, Xu TJ, Zhu R, Zhang Y, Li SQ, Wang HW, Li JT. Xu GC, et al. Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157. Gigascience. 2019. PMID: 30576505 Free PMC article. - PCAP: a whole-genome assembly program.
Huang X, Wang J, Aluru S, Yang SP, Hillier L. Huang X, et al. Genome Res. 2003 Sep;13(9):2164-70. doi: 10.1101/gr.1390403. Genome Res. 2003. PMID: 12952883 Free PMC article. - A comprehensive review of scaffolding methods in genome assembly.
Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C. Luo J, et al. Brief Bioinform. 2021 Sep 2;22(5):bbab033. doi: 10.1093/bib/bbab033. Brief Bioinform. 2021. PMID: 33634311 Review. - Sequence assembly using next generation sequencing data--challenges and solutions.
Chin FY, Leung HC, Yiu SM. Chin FY, et al. Sci China Life Sci. 2014 Nov;57(11):1140-8. doi: 10.1007/s11427-014-4752-9. Epub 2014 Oct 17. Sci China Life Sci. 2014. PMID: 25326069 Review.
Cited by
- Evolutionary dynamics of the accessory genome of Listeria monocytogenes.
den Bakker HC, Desjardins CA, Griggs AD, Peters JE, Zeng Q, Young SK, Kodira CD, Yandava C, Hepburn TA, Haas BJ, Birren BW, Wiedmann M. den Bakker HC, et al. PLoS One. 2013 Jun 25;8(6):e67511. doi: 10.1371/journal.pone.0067511. Print 2013. PLoS One. 2013. PMID: 23825666 Free PMC article. - Genome assembly quality: assessment and improvement using the neutral indel model.
Meader S, Hillier LW, Locke D, Ponting CP, Lunter G. Meader S, et al. Genome Res. 2010 May;20(5):675-84. doi: 10.1101/gr.096966.109. Epub 2010 Mar 19. Genome Res. 2010. PMID: 20305016 Free PMC article. - Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects.
Degnan PH, Lazarus AB, Wernegreen JJ. Degnan PH, et al. Genome Res. 2005 Aug;15(8):1023-33. doi: 10.1101/gr.3771305. Genome Res. 2005. PMID: 16077009 Free PMC article. - Complete genome sequence of Burkholderia rhizoxinica, an Endosymbiont of Rhizopus microsporus.
Lackner G, Moebius N, Partida-Martinez L, Hertweck C. Lackner G, et al. J Bacteriol. 2011 Feb;193(3):783-4. doi: 10.1128/JB.01318-10. Epub 2010 Dec 3. J Bacteriol. 2011. PMID: 21131495 Free PMC article. - Genome sequencing and mapping reveal loss of heterozygosity as a mechanism for rapid adaptation in the vegetable pathogen Phytophthora capsici.
Lamour KH, Mudge J, Gobena D, Hurtado-Gonzales OP, Schmutz J, Kuo A, Miller NA, Rice BJ, Raffaele S, Cano LM, Bharti AK, Donahoo RS, Finley S, Huitema E, Hulvey J, Platt D, Salamov A, Savidor A, Sharma R, Stam R, Storey D, Thines M, Win J, Haas BJ, Dinwiddie DL, Jenkins J, Knight JR, Affourtit JP, Han CS, Chertkov O, Lindquist EA, Detter C, Grigoriev IV, Kamoun S, Kingsmore SF. Lamour KH, et al. Mol Plant Microbe Interact. 2012 Oct;25(10):1350-60. doi: 10.1094/MPMI-02-12-0028-R. Mol Plant Microbe Interact. 2012. PMID: 22712506 Free PMC article.
References
- Aparicio S., Chapman, J., Stupka, E., Putnam, N., Chia, J., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed
- Dietrich W.F., Miller, J., Steen, R., Merchant, M.A., Damron-Boles, D., Husain, Z., Dredge, R., Daly, M.J., Ingalls, K.A., O'Connor, T.J., et al. 1996. A comprehensive genetic map of the mouse genome. Nature 380: 149-152. - PubMed
- Edwards A., Voss, H., Rice, P., Civitello, A., Stegemann, J., Schwager, C., Zimmermann, J., Erfle, H., Caskey, C.T., and Ansorge, W. 1990. Automated DNA sequencing of the human HPRT locus. Genomics 6: 593-608. - PubMed
- Fleischmann R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J., Dougherty, B.A., and Merrick, J.M. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources