The role of replicates for error mitigation in next-generation sequencing (original) (raw)
O'Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med.5, 28 (2013). CASPubMedPubMed Central Google Scholar
Kircher, M., Heyn, P. & Kelso, J. Addressing challenges in the production and analysis of Illumina sequencing data. BMC Genomics12, 382 (2011). CASPubMedPubMed Central Google Scholar
Metzker, M. L. Sequencing technologies — the next generation. Nature Rev. Genet.11, 31–46 (2010). CASPubMed Google Scholar
Sboner, A., Mu, X. J., Greenbaum, D., Auerbach, R. K. & Gerstein, M. B. The real cost of sequencing: higher than you think! Genome Biol.12, 125 (2011). PubMedPubMed Central Google Scholar
Ratan, A. et al. Comparison of sequencing platforms for single nucleotide variant calls in a human sample. PLoS ONE8, e55089 (2013). CASPubMedPubMed Central Google Scholar
Peters, B. A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature487, 190–195 (2012). CASPubMedPubMed Central Google Scholar
Williams, C. et al. A high frequency of sequence alterations is due to formalin fixation of archival specimens. Am. J. Pathol.155, 1467–1471 (1999). CASPubMedPubMed Central Google Scholar
Yost, S. E. et al. Identification of high-confidence somatic mutations in whole genome sequence of formalin-fixed breast cancer specimens. Nucleic Acids Res.40, e107 (2012). CASPubMedPubMed Central Google Scholar
Akbari, M., Hansen, M. D., Halgunset, J., Skorpen, F. & Krokan, H. E. Low copy number DNA template can render polymerase chain reaction error prone in a sequence-dependent manner. J. Mol. Diagn.7, 36–39 (2005). CASPubMedPubMed Central Google Scholar
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature499, 214–218 (2013). CASPubMedPubMed Central Google Scholar
Leal, S. M. Detection of genotyping errors and pseudo-SNPs via deviations from Hardy–Weinberg equilibrium. Genet. Epidemiol.29, 204–214 (2005). PubMedPubMed Central Google Scholar
Walsh, P. S., Erlich, H. A. & Higuchi, R. Preferential PCR amplification of alleles: mechanisms and solutions. PCR Methods Appl.1, 241–250 (1992). CASPubMed Google Scholar
Hutchison, C. A. 3rd, Smith, H. O., Pfannkoch, C. & Venter, J. C. Cell-free cloning using phi29 DNA polymerase. Proc. Natl Acad. Sci. USA102, 17332–17336 (2005). CASPubMedPubMed Central Google Scholar
Hodges, E. et al. Genome-wide in situ exon capture for selective resequencing. Nature Genet.39, 1522–1527 (2007). CASPubMed Google Scholar
Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol.12, R18 (2011). CASPubMedPubMed Central Google Scholar
Koboldt, D. C., Ding, L., Mardis, E. R. & Wilson, R. K. Challenges of sequencing human genomes. Brief Bioinform.11, 484–498 (2010). CASPubMedPubMed Central Google Scholar
Xuan, J., Yu, Y., Qing, T., Guo, L. & Shi, L. Next-generation sequencing in the clinic: promises and challenges. Cancer Lett.340, 284–295 (2012). PubMedPubMed Central Google Scholar
Fuller, C. W. et al. The challenges of sequencing by synthesis. Nature Biotech.27, 1013–1023 (2009). CAS Google Scholar
Roberts, R. J., Carneiro, M. O. & Schatz, M. C. The advantages of SMRT sequencing. Genome Biol.14, 405 (2013). PubMedPubMed Central Google Scholar
Yang, X., Chockalingam, S. P. & Aluru, S. A survey of error-correction methods for next-generation sequencing. Brief Bioinform.14, 56–66 (2013). CASPubMed Google Scholar
Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl Acad. Sci. USA107, 961–968 (2010). CASPubMedPubMed Central Google Scholar
Laurie, C. C. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nature Genet.44, 642–650 (2012). CASPubMed Google Scholar
Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA109, 14508–14513 (2012). CASPubMedPubMed Central Google Scholar
Luo, C., Tsementzi, D., Kyrpides, N., Read, T. & Konstantinidis, K. T. Direct comparisons of Illumina versus Roche 454 sequencing technologies on the same microbial community DNA sample. PLoS ONE7, e30087 (2012). CASPubMedPubMed Central Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet.43, 491–498 (2011). CASPubMed Google Scholar
Ajay, S. S., Parker, S. C., Abaan, H. O., Fajardo, K. V. & Margulies, E. H. Accurate and comprehensive sequencing of personal genomes. Genome Res.21, 1498–1505 (2011). PubMedPubMed Central Google Scholar
Meynert, A. M., Bicknell, L. S., Hurles, M. E., Jackson, A. P. & Taylor, M. S. Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics14, 195 (2013). PubMedPubMed Central Google Scholar
Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Rev. Genet.11, 733–739 (2010). CASPubMed Google Scholar
Baranzini, S. E. et al. Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature464, 1351–1356 (2010). CASPubMedPubMed Central Google Scholar
Reumers, J. et al. Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing. Nature Biotech.30, 61–68 (2012). CAS Google Scholar
Lam, H. Y. et al. Performance comparison of whole-genome sequencing platforms. Nature Biotech.30, 78–82 (2012). CAS Google Scholar
Jung, H., Bleazard, T., Lee, J. & Hong, D. Systematic investigation of cancer-associated somatic point mutations in SNP databases. Nature Biotech.31, 787–789 (2013). CAS Google Scholar
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science327, 78–81 (2010). CASPubMed Google Scholar
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res.18, 1851–1858 (2008). CASPubMedPubMed Central Google Scholar
Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature465, 473–477 (2010). CASPubMed Google Scholar
Ball, M. P. et al. A public resource facilitating clinical use of genomes. Proc. Natl Acad. Sci. USA109, 11920–11927 (2012). CASPubMedPubMed Central Google Scholar
Laurie, C. C. et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol.34, 591–602 (2010). PubMedPubMed Central Google Scholar
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics25, 2865–2871 (2009). CASPubMedPubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res.110, 462–467 (2005). CASPubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics26, 589–595 (2010). PubMedPubMed Central Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.10, R25 (2009). PubMedPubMed Central Google Scholar
Lindgreen, S. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res. Notes5, 337 (2012). PubMedPubMed Central Google Scholar
Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics25, 3207–3212 (2009). CASPubMedPubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res.20, 1297–1303 (2010). CASPubMedPubMed Central Google Scholar
Genovese, G. et al. Using population admixture to help complete maps of the human genome. Nature Genet.45, 406–414 (2013). CASPubMed Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res.35, D61–D65 (2007). CASPubMed Google Scholar
Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nature Biotech.29, 51–57 (2011). CAS Google Scholar
Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech.29, 59–63 (2011). CAS Google Scholar
Browning, S. R. & Browning, B. L. Haplotype phasing: existing methods and new developments. Nature Rev. Genet.12, 703–714 (2011). CASPubMed Google Scholar
Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics24, i153–i159 (2008). PubMed Google Scholar
Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell148, 1293–1307 (2012). CASPubMedPubMed Central Google Scholar
Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science328, 636–639 (2010). CASPubMedPubMed Central Google Scholar
Lupski, J. R. et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N. Engl. J. Med.362, 1181–1191 (2010). CASPubMedPubMed Central Google Scholar
Chapman, S. J. & Hill, A. V. Human genetic susceptibility to infectious disease. Nature Rev. Genet.13, 175–188 (2012). CASPubMed Google Scholar
Ott, J., Kamatani, Y. & Lathrop, M. Family-based designs for genome-wide association studies. Nature Rev. Genet.12, 465–474 (2011). CASPubMed Google Scholar
Gibson, G. Rare and common variants: twenty arguments. Nature Rev. Genet.13, 135–145 (2011). Google Scholar
Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. Nature Rev. Genet.11, 843–854 (2010). CASPubMed Google Scholar
Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature493, 45–50 (2013). PubMed Google Scholar
Robins, W. P., Faruque, S. M. & Mekalanos, J. J. Coupling mutagenesis and parallel deep sequencing to probe essential residues in a genome or gene. Proc. Natl Acad. Sci. USA110, E848–857 (2013). CASPubMedPubMed Central Google Scholar
Conrad, T. M., Lewis, N. E. & Palsson, B. O. Microbial laboratory evolution in the era of genome-scale science. Mol. Syst. Biol.7, 509 (2011). PubMedPubMed Central Google Scholar
Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science309, 1728–1732 (2005). CASPubMed Google Scholar
Barrick, J. E. & Lenski, R. E. Genome dynamics during experimental evolution. Nature Rev. Genet.14, 827–839 (2013). CASPubMed Google Scholar
Xu, X. et al. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line. Nature Biotech.29, 735–741 (2011). CAS Google Scholar
Lewis, N. E. et al. Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome. Nature Biotech.31, 759–765 (2013). CAS Google Scholar
Brinkrolf, K. et al. Chinese hamster genome sequenced from sorted chromosomes. Nature Biotech.31, 694–695 (2013). CAS Google Scholar
Becker, J. et al. Unraveling the Chinese hamster ovary cell line transcriptome by next-generation sequencing. J. Biotechnol.156, 227–235 (2011). CASPubMed Google Scholar
Kildegaard, H. F., Baycin-Hizal, D., Lewis, N. E. & Betenbaugh, M. J. The emerging CHO systems biology era: harnessing the 'omics revolution for biotechnology. Curr. Opin. Biotechnol.24, 1102–1107 (2013). PubMed Google Scholar
Furey, T. S. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nature Rev. Genet.13, 840–852 (2012). CASPubMed Google Scholar
Meaburn, E. & Schulz, R. Next generation sequencing in epigenetics: insights and challenges. Semin. Cell Dev. Biol.23, 192–199 (2012). CASPubMed Google Scholar
Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature456, 66–72 (2008). CASPubMedPubMed Central Google Scholar
Rios, J., Stein, E., Shendure, J., Hobbs, H. H. & Cohen, J. C. Identification by whole-genome resequencing of gene defect responsible for severe hypercholesterolemia. Hum. Mol. Genet.19, 4313–4318 (2010). CASPubMedPubMed Central Google Scholar
Schneeberger, K. et al. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nature Methods6, 550–551 (2009). CASPubMed Google Scholar
Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Rev. Genet.12, 628–640 (2011). CASPubMed Google Scholar
Gonzalez-Perez, A. et al. Computational approaches to identify functional genetic variants in cancer genomes. Nature Methods10, 723–729 (2013). CASPubMedPubMed Central Google Scholar
Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res.39, e118 (2011). CASPubMedPubMed Central Google Scholar
Lewis, N. E. & Abdel-Haleem, A. M. The evolution of genome-scale models of cancer metabolism. Front. Physiol.4, 237 (2013). PubMedPubMed Central Google Scholar
Ala-Korpela, M., Kangas, A. J. & Inouye, M. Genome-wide association studies and systems biology: together at last. Trends Genet.27, 493–498 (2011). CASPubMed Google Scholar
Moreau, Y. & Tranchevent, L. C. Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nature Rev. Genet.13, 523–536 (2012). CASPubMed Google Scholar
Zamft, B. M. et al. Measuring cation dependent DNA polymerase fidelity landscapes by deep sequencing. PLoS ONE7, e43876 (2012). CASPubMedPubMed Central Google Scholar
Drukier, A. et al. New dark matter detectors using DNA for nanometer tracking. arXiv 1206.6809 (2012).
Hubisz, M. J., Lin, M. F., Kellis, M. & Siepel, A. Error and error mitigation in low-coverage genome assemblies. PLoS ONE6, e17034 (2011). CASPubMedPubMed Central Google Scholar
Macabeo-Ong, M. et al. Effect of duration of fixation on quantitative reverse transcription polymerase chain reaction analyses. Mod. Pathol.15, 979–987 (2002). PubMed Google Scholar
Kerick, M. et al. Targeted high throughput sequencing in clinical cancer settings: formaldehyde fixed-paraffin embedded (FFPE) tumor tissues, input amount and tumor heterogeneity. BMC Med. Genom.4, 68 (2011). CAS Google Scholar
Lin, M. T. et al. Quantifying the relative amount of mouse and human DNA in cancer xenografts using species-specific variation in gene length. Biotechniques48, 211–218 (2010). CASPubMedPubMed Central Google Scholar
Innis, M. A., Gelfand, D. H., Sninsky, J. J. & White, T. J. PCR protocols: a guide to methods and applications (Academic press, 1990). Google Scholar
Wojdacz, T. K., Hansen, L. L. & Dobrovic, A. A new approach to primer design for the control of PCR bias in methylation studies. BMC Res. Notes1, 54 (2008). PubMedPubMed Central Google Scholar
Kanagawa, T. Bias and artifacts in multitemplate polymerase chain reactions (PCR). J. Biosci. Bioeng.96, 317–323 (2003). CASPubMed Google Scholar
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science320, 1344–1349 (2008). CASPubMedPubMed Central Google Scholar
Pont-Kingdon, G. et al. Design and analytical validation of clinical DNA sequencing assays. Arch. Pathol. Lab Med.136, 41–46 (2012). CASPubMed Google Scholar
Gogol-Doring, A. & Chen, W. An overview of the analysis of next generation sequencing data. Methods Mol. Biol.802, 249–257 (2012). CASPubMed Google Scholar
Whiteford, N. et al. Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics25, 2194–2199 (2009). CASPubMedPubMed Central Google Scholar
Loman, N. J. et al. Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotech.30, 434–439 (2012). CAS Google Scholar
Huse, S. M., Huber, J. A., Morrison, H. G., Sogin, M. L. & Welch, D. M. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol.8, R143 (2007). PubMedPubMed Central Google Scholar