Predicting the molecular complexity of sequencing libraries - PubMed (original) (raw)
Predicting the molecular complexity of sequencing libraries
Timothy Daley et al. Nat Methods. 2013 Apr.
Abstract
Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.
Figures
Figure 1
Two hypothetical libraries containing 10 million (M) distinct molecules. (a) In library 1, half of the molecules (5 M) exist at the same level making up 99 % of the library. (b) In library 2, ten thousand molecule represents half the material in the library. (c) Based on a shallow sequencing run (1 M reads), library 1 appears to contain a greater diversity of molecules. (d) After additional sequencing, library 2 yields more distinct observations. (e) Such situations do occur in practice. Initial observed complexity from 5 M reads for two BS-seq libraries indicates the Human Sperm is the more complex library. Observed library complexity curves cross after additional sequencing, with the Chimp Sperm library yielding more distinct reads. Estimates using Rational Function (RF) and Euler’s transform (ET) fit to initial experiments predict crossing (though ET becomes unstable), while zero-truncated negative binomial (ZTNB) does not.
Figure 2
Library complexity can be estimated both in terms of distinct molecules sequenced and in terms of distinct loci identified. (a) A ChIP-seq library (CTCF; mouse B-Cells) yields additional molecules after sequencing 100 million (M) reads; the RF remains accurate while the ZTNB loses accuracy. (b) In the same library, the number of mapped distinct genomic 1 kb windows saturates after 25 M reads. The rational function approximation (RF) is accurate and forecasts saturation, while the zero-truncated Negative Binomial (ZTNB) significantly overestimates. (c) An RNA-seq (Human adipose-derived mesenchymal stem (ADS) cells) library continues to yield additional molecules after 200 M reads; the RF remains accurate while the ZTNB predicts saturation. (d) In the same library, reads continued mapping to new 300 bp windows after 200 M reads. ZTNB incorrectly predicts saturation, while RF does not.
Similar articles
- Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.
Quail MA, Matthews L, Sims S, Lloyd C, Beasley H, Baxter SW. Quail MA, et al. Methods Mol Biol. 2011;772:59-81. doi: 10.1007/978-1-61779-228-1_4. Methods Mol Biol. 2011. PMID: 22065432 - Maximizing the acquisition of unique reads in noninvasive capture sequencing experiments.
Fontsere C, Alvarez-Estape M, Lester J, Arandjelovic M, Kuhlwilm M, Dieguez P, Agbor A, Angedakin S, Ayuk Ayimisin E, Bessone M, Brazzola G, Deschner T, Eno-Nku M, Granjon AC, Head J, Kadam P, Kalan AK, Kambi M, Langergraber K, Lapuente J, Maretti G, Jayne Ormsby L, Piel A, Robbins MM, Stewart F, Vergnes V, Wittig RM, Kühl HS, Marques-Bonet T, Hughes DA, Lizano E. Fontsere C, et al. Mol Ecol Resour. 2021 Apr;21(3):745-761. doi: 10.1111/1755-0998.13300. Epub 2020 Dec 19. Mol Ecol Resour. 2021. PMID: 33217149 - Genomics through the lens of next-generation sequencing.
Capra JA, Carbone L, Riesenfeld SJ, Wall JD. Capra JA, et al. Genome Biol. 2010;11(6):306. doi: 10.1186/gb-2010-11-6-306. Epub 2010 Jun 25. Genome Biol. 2010. PMID: 20587080 Free PMC article. - Genomic variability and protein species - Improving sequence coverage for proteogenomics.
Bischoff R, Permentier H, Guryev V, Horvatovich P. Bischoff R, et al. J Proteomics. 2016 Feb 16;134:25-36. doi: 10.1016/j.jprot.2015.09.021. Epub 2015 Sep 21. J Proteomics. 2016. PMID: 26394375 Review. - [Comparative studies on human and chimpanzee genomes].
Yoko K, Atsushi T, Hideki N, Asao F. Yoko K, et al. Tanpakushitsu Kakusan Koso. 2005 Dec;50(16 Suppl):2072-7. Tanpakushitsu Kakusan Koso. 2005. PMID: 16411432 Review. Japanese. No abstract available.
Cited by
- Chromosome-scale genome assembly of the tropical abalone (Haliotis asinina).
Barkan R, Cooke I, Watson SA, Lau SCY, Strugnell JM. Barkan R, et al. Sci Data. 2024 Sep 12;11(1):999. doi: 10.1038/s41597-024-03840-w. Sci Data. 2024. PMID: 39266538 Free PMC article. - Temporally discordant chromatin accessibility and DNA demethylation define short and long-term enhancer regulation during cell fate specification.
Guerin LN, Scott TJ, Yap JA, Johansson A, Puddu F, Charlesworth T, Yang Y, Simmons AJ, Lau KS, Ihrie RA, Hodges E. Guerin LN, et al. bioRxiv [Preprint]. 2024 Aug 27:2024.08.27.609789. doi: 10.1101/2024.08.27.609789. bioRxiv. 2024. PMID: 39253426 Free PMC article. Preprint. - The CALERIE™ Genomic Data Resource.
Ryan CP, Corcoran DL, Banskota N, Eckstein IC, Floratos A, Friedman R, Kobor MS, Kraus VB, Kraus WE, MacIsaac JL, Orenduff MC, Pieper CF, White JP, Ferrucci L, Horvath S, Huffman KM, Belsky DW. Ryan CP, et al. bioRxiv [Preprint]. 2024 Aug 22:2024.05.17.594714. doi: 10.1101/2024.05.17.594714. bioRxiv. 2024. PMID: 39229162 Free PMC article. Updated. Preprint. - Loss of ARID3A perturbs intestinal epithelial proliferation-differentiation ratio and regeneration.
Angelis N, Baulies A, Hubl F, Kucharska A, Kelly G, Llorian M, Boeing S, Li VSW. Angelis N, et al. J Exp Med. 2024 Oct 7;221(10):e20232279. doi: 10.1084/jem.20232279. Epub 2024 Aug 16. J Exp Med. 2024. PMID: 39150450 Free PMC article. - Targeting PRMT5 enhances the radiosensitivity of tumor cells grown in vitro and in vivo.
Degorre C, Lohard S, Bobrek CN, Rawal KN, Kuhn S, Tofilon PJ. Degorre C, et al. Sci Rep. 2024 Jul 27;14(1):17316. doi: 10.1038/s41598-024-68405-8. Sci Rep. 2024. PMID: 39068290 Free PMC article.
References
- Lander E, Waterman M. Genomics. 1988;2:231–239. - PubMed
- Fisher RA, Corbet S, Williams CB. J. Anim. Ecol. 1943;12:42–58.
- Good IJ. Biometrika. 1953;40:237–264.
- Kivioja T, et al. Nat. Methods. 2012;9:72–74. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources