De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data - PubMed (original) (raw)

doi: 10.1186/gb-2009-10-9-r94. Epub 2009 Sep 11.

Nancy Y Liao, Darren Platt, Gordon Robertson, Michael Seidel, Simon K Chan, T Roderick Docking, Inanc Birol, Robert A Holt, Martin Hirst, Elaine Mardis, Marco A Marra, Richard C Hamelin, Jörg Bohlmann, Colette Breuil, Steven Jm Jones

Affiliations

De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data

Scott Diguistini et al. Genome Biol. 2009.

Abstract

Sequencing-by-synthesis technologies can reduce the cost of generating de novo genome assemblies. We report a method for assembling draft genome sequences of eukaryotic organisms that integrates sequence information from different sources, and demonstrate its effectiveness by assembling an approximately 32.5 Mb draft genome sequence for the forest pathogen Grosmannia clavigera, an ascomycete fungus. We also developed a method for assessing draft assemblies using Illumina paired end read data and demonstrate how we are using it to guide future sequence finishing. Our results demonstrate that eukaryotic genome sequences can be accurately assembled by combining Illumina, 454 and Sanger sequence data.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Assembly process overview. Overview of the process for producing de novo assemblies.

Figure 2

Figure 2

Consensus sequence quality. The proportion of 454 read data within the total read collection affected the number of small insertions and deletions (indels) based on analysis of 7,169 unique EST-to-genome alignments. The relative proportions of insertions (blue) and deletions (orange) in the assembly sequence are shown in the inset pie chart. Assemblies are described in Tables 1 and 2; those including 454 read data were assembled with Forge; the Illumina-only assembly was generated with Velvet.

Figure 3

Figure 3

Comparison of Forge Sanger/454/Illumina assemblies against _GC_gb1. Alignments of scaffolds greater that 100 kb - (a) 'Sanger/454/IlluminaDA' (approximately 24 Mb on 80 scaffolds) and (b) 'Sanger/454/IlluminaPA' (approximately 28.7 Mb on 46 scaffolds) - on the y-axis against the manually finished genome sequence (_GC_gb1) on the x-axis.

Figure 4

Figure 4

Assessing the discovery of unique read information between the Illumina and 454 platforms. (a) Raw reads were processed into overlapping 28-bp _k_-mers, and any _k_-mer that varied from all other _k_-mers by at least 1 bp was accepted as new sequence information. The analysis was done separately for unique _k_-mers and those that occurred at least twice (2× _k_-mers). (b) MAQ was then used to map these _k_-mers to the reference genome sequence and the rate at which new coverage was generated was plotted against the number of _k_-mers examined.

Similar articles

Cited by

References

    1. Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM. Accuracy and quality of massively-parallel DNA pyrosequencing. Genome Biol. 2007;8:R143. doi: 10.1186/gb-2007-8-7-r143. - DOI - PMC - PubMed
    1. Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB. ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res. 2008;18:810–820. doi: 10.1101/gr.7337908. - DOI - PMC - PubMed
    1. Warren R, Sutton G, Jones S, Holt R. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007;23:500–501. doi: 10.1093/bioinformatics/btl629. - DOI - PMC - PubMed
    1. Zerbino D, Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. - DOI - PMC - PubMed
    1. Simpson J, Wong K, Jackman S, Schein J, Jones SJM, Birol I. ABySS: A parallel assembler for short read sequence data. Genome Res. 2009;19:1117–1123. doi: 10.1101/gr.089532.108. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources