An algorithm for automated closure during assembly - PubMed (original) (raw)
An algorithm for automated closure during assembly
Sergey Koren et al. BMC Bioinformatics. 2010.
Abstract
Background: Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local re-assembly of gap regions. An obvious alternative uses de novo assembly of all the reads.
Results: A procedure called the bounding read algorithm was developed for assembly of shotgun reads plus finishing reads and their constraints, targeting repeat regions. The algorithm was implemented within the Celera Assembler software and its pyrosequencing-specific variant, CABOG. The implementation was tested on Sanger and pyrosequencing data from six genomes. The bounding read assemblies were compared to assemblies from two other methods on the same data. The algorithm generates improved assemblies of repeat regions, closing and tiling some gaps while degrading none.
Conclusions: The algorithm is useful for small-genome automated finishing projects. Our implementation is available as open-source from http://wgs-assembler.sourceforge.net under the GNU Public License.
Figures
Figure 1
Use of finishing reads. Two algorithms for assembling shotgun reads and finishing reads. The control treats both read types equally. The bounded algorithm attempts to assemble finishing reads consistently with their bounding constraints. For each algorithm, the figure shows its construction of a scaffold from contigs (rectangles) with 2X in shotgun reads (black lines). Each finishing read (colored line) has a corresponding pair of PCR primer sites (arrows of same color). External to the scaffold is a unitig (grey area) deemed repetitive due to high coverage. (a) A mate pair constraint (curve) localizes one read and the unitig to this gap. Nevertheless, the control algorithm cannot tile this gap with reads. The bounded algorithm localizes two finishing reads by their primer sites. The bounded algorithm does tile the gap with reads, enabling a more accurate consensus sequence. (b) The control cannot localize the unitig or any reads to this gap. It does not close the gap. The bounded algorithm localizes the unitig by finishing reads and their primer sites. It tiles the gap with finishing reads from the unitig. (c) Both algorithms assemble finishing reads from a gap that is not a genomic repeat. In our data sets, most finishing reads fit gaps of this type.
Similar articles
- Aggressive assembly of pyrosequencing reads with mates.
Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G. Miller JR, et al. Bioinformatics. 2008 Dec 15;24(24):2818-24. doi: 10.1093/bioinformatics/btn548. Epub 2008 Oct 24. Bioinformatics. 2008. PMID: 18952627 Free PMC article. - Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.
English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X, Muzny DM, Reid JG, Worley KC, Gibbs RA. English AC, et al. PLoS One. 2012;7(11):e47768. doi: 10.1371/journal.pone.0047768. Epub 2012 Nov 21. PLoS One. 2012. PMID: 23185243 Free PMC article. - Consensus generation and variant detection by Celera Assembler.
Denisov G, Walenz B, Halpern AL, Miller J, Axelrod N, Levy S, Sutton G. Denisov G, et al. Bioinformatics. 2008 Apr 15;24(8):1035-40. doi: 10.1093/bioinformatics/btn074. Epub 2008 Mar 4. Bioinformatics. 2008. PMID: 18321888 - De novo assembly of short sequence reads.
Paszkiewicz K, Studholme DJ. Paszkiewicz K, et al. Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review. - Algorisms used for in silico finishing of bacterial genomes based on short-read assemblage implemented in GenoFinisher, AceFileViewer, and ShortReadManager.
Ohtsubo Y, Hirose Y, Nagata Y. Ohtsubo Y, et al. Biosci Biotechnol Biochem. 2022 May 24;86(6):693-703. doi: 10.1093/bbb/zbac032. Biosci Biotechnol Biochem. 2022. PMID: 35425950 Review.
Cited by
- MetAMOS: a modular and open source metagenomic assembly and analysis pipeline.
Treangen TJ, Koren S, Sommer DD, Liu B, Astrovskaya I, Ondov B, Darling AE, Phillippy AM, Pop M. Treangen TJ, et al. Genome Biol. 2013 Jan 15;14(1):R2. doi: 10.1186/gb-2013-14-1-r2. Genome Biol. 2013. PMID: 23320958 Free PMC article. - Evolutionary erosion of yeast sex chromosomes by mating-type switching accidents.
Gordon JL, Armisén D, Proux-Wéra E, ÓhÉigeartaigh SS, Byrne KP, Wolfe KH. Gordon JL, et al. Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):20024-9. doi: 10.1073/pnas.1112808108. Epub 2011 Nov 28. Proc Natl Acad Sci U S A. 2011. PMID: 22123960 Free PMC article. - Sequencing intractable DNA to close microbial genomes.
Hurt RA Jr, Brown SD, Podar M, Palumbo AV, Elias DA. Hurt RA Jr, et al. PLoS One. 2012;7(7):e41295. doi: 10.1371/journal.pone.0041295. Epub 2012 Jul 31. PLoS One. 2012. PMID: 22859974 Free PMC article. - Review of general algorithmic features for genome assemblers for next generation sequencers.
Wajid B, Serpedin E. Wajid B, et al. Genomics Proteomics Bioinformatics. 2012 Apr;10(2):58-73. doi: 10.1016/j.gpb.2012.05.006. Epub 2012 Jun 9. Genomics Proteomics Bioinformatics. 2012. PMID: 22768980 Free PMC article. Review. - A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach.
Proux-Wéra E, Armisén D, Byrne KP, Wolfe KH. Proux-Wéra E, et al. BMC Bioinformatics. 2012 Sep 17;13:237. doi: 10.1186/1471-2105-13-237. BMC Bioinformatics. 2012. PMID: 22984983 Free PMC article.
References
- Frangeul L, Nelson KE, Buchrieser C, Danchin A, Glaser P, Kunst F. Cloning and assembly strategies in microbial genome projects. Microbiology. 1999;145(Pt 10):2625–2634. - PubMed
- Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res. 1998;8(3):195–202. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous