Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data - PubMed (original) (raw)
Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data
Aarti Desai et al. PLoS One. 2013.
Abstract
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Conflict of interest statement
Competing Interests: All authors are affiliated to Persistent LABS and/or Persistent Systems, Pune. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.
Figures
Figure 1. N50 value for the genomes assembled by Velvet, SOAPdenovo, ABySS, Meraculous and IDBA-UD.
A) N50 for assembled E.coli genome: N50 is the length of the smallest contig which when added to a set of larger contigs yields at least 50% of the genome. The N50 values for IDBA-UD, Velvet and SOAPdenovo seemed to reach a plateau at 35X, ABySS at 50X depth of coverage. On the other hand, the N50 value of Meraculous generated assembly increased till 150X depth of coverage. B) N50 for assembled S.kudriavzevii genome: IDBA-UD and SOAPdenovo attained peak N50 value at 35X and 100X depth of coverage respectively, whereas the N50 value of Velvet, ABySS and Meraculous generated assembly increased till 150X depth of coverage. C) N50 for assembled C.elegans genome: SOAPdenovo, ABySS and IDBA-UD reached peak N50 value at 100X depth of coverage, whereas the N50 value of Velvet generated assembly increased approximately 1.5 fold until 150X with no change thereafter. Velvet generated assembly had the best N50 values of all the 4 assemblers.
Figure 2. Memory requirement for genome assembly.
Memory required to assemble E.coli (A), S.kudriavzevii (B) and C.elegans (C) genomes increased, although not proportionately, with increasing depth of sequencing.
Similar articles
- Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.
Cosma BM, Shirali Hossein Zade R, Jordan EN, van Lent P, Peng C, Pillay S, Abeel T. Cosma BM, et al. Gigascience. 2022 Dec 28;12:giad100. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24. Gigascience. 2022. PMID: 38000912 Free PMC article. - Evaluation of nine popular de novo assemblers in microbial genome assembly.
Forouzan E, Maleki MSM, Karkhane AA, Yakhchali B. Forouzan E, et al. J Microbiol Methods. 2017 Dec;143:32-37. doi: 10.1016/j.mimet.2017.09.008. Epub 2017 Sep 19. J Microbiol Methods. 2017. PMID: 28939423 - Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.
Cherukuri Y, Janga SC. Cherukuri Y, et al. BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article. - Sequence assembly using next generation sequencing data--challenges and solutions.
Chin FY, Leung HC, Yiu SM. Chin FY, et al. Sci China Life Sci. 2014 Nov;57(11):1140-8. doi: 10.1007/s11427-014-4752-9. Epub 2014 Oct 17. Sci China Life Sci. 2014. PMID: 25326069 Review. - Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.
Wang Z, Wang Y, Fuhrman JA, Sun F, Zhu S. Wang Z, et al. Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025. Brief Bioinform. 2020. PMID: 30860572 Free PMC article. Review.
Cited by
- Next-generation sequencing-based transcriptome analysis of Helicoverpa armigera Larvae immune-primed with Photorhabdus luminescens TT01.
Zhao Z, Wu G, Wang J, Liu C, Qiu L. Zhao Z, et al. PLoS One. 2013 Nov 26;8(11):e80146. doi: 10.1371/journal.pone.0080146. eCollection 2013. PLoS One. 2013. PMID: 24302999 Free PMC article. - Root Endophytes and Ginkgo biloba Are Likely to Share and Compensate Secondary Metabolic Processes, and Potentially Exchange Genetic Information by LTR-RTs.
Zou K, Liu X, Hu Q, Zhang D, Fu S, Zhang S, Huang H, Lei F, Zhang G, Miao B, Meng D, Jiang L, Liu H, Yin H, Liang Y. Zou K, et al. Front Plant Sci. 2021 Jul 9;12:704985. doi: 10.3389/fpls.2021.704985. eCollection 2021. Front Plant Sci. 2021. PMID: 34305992 Free PMC article. - Terabase-Scale Coassembly of a Tropical Soil Microbiome.
Riley R, Bowers RM, Camargo AP, Campbell A, Egan R, Eloe-Fadrosh EA, Foster B, Hofmeyr S, Huntemann M, Kellom M, Kimbrel JA, Oliker L, Yelick K, Pett-Ridge J, Salamov A, Varghese NJ, Clum A. Riley R, et al. Microbiol Spectr. 2023 Aug 17;11(4):e0020023. doi: 10.1128/spectrum.00200-23. Epub 2023 Jun 13. Microbiol Spectr. 2023. PMID: 37310219 Free PMC article. - Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.
Daly GM, Leggett RM, Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez RH, Caccamo M, Bernal W, Heeney JL. Daly GM, et al. PLoS One. 2015 Jun 22;10(6):e0129059. doi: 10.1371/journal.pone.0129059. eCollection 2015. PLoS One. 2015. PMID: 26098299 Free PMC article. - DUGMO: tool for the detection of unknown genetically modified organisms with high-throughput sequencing data for pure bacterial samples.
Hurel J, Schbath S, Bougeard S, Rolland M, Petrillo M, Touzain F. Hurel J, et al. BMC Bioinformatics. 2020 Jul 6;21(1):284. doi: 10.1186/s12859-020-03611-5. BMC Bioinformatics. 2020. PMID: 32631215 Free PMC article.
References
- Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, et al. (2002) The genome sequence of the malaria mosquito Anopheles gambiae. Science 298 (5591): 129–149. - PubMed
- Lander ES, Linton LM, Birren B, Nusbaum C, The International Human Genome Consortium, et al. (2001) Initial sequencing and analysis of the human genome. Nature, Vol. 409 (6822): 860–921. - PubMed
- Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol. 26: 1135–1145. - PubMed
- Metzker ML (2010) Sequencing technologies – the next generation. Nat Rev Genet. 11: 31–46. - PubMed
MeSH terms
Grants and funding
The authors have no support or funding to report.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous