Genotype and SNP calling from next-generation sequencing data (original) (raw)
Metzker, M. Sequencing technologies — the next generation. Nature Rev. Genet.11, 31–46 (2010). This article provides an excellent Review of NGS technologies and their applications. ArticleCAS Google Scholar
Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature463, 311–317 (2010). ArticleCAS Google Scholar
Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genet.42, 30–35 (2010). ArticleCAS Google Scholar
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science320, 1344–1349 (2008). ArticleCAS Google Scholar
Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotech.28, 503–510 (2010). ArticleCAS Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotech.28, 511–515 (2010). ArticleCAS Google Scholar
Liti, G. et al. Population genomics of domestic and wild yeasts. Nature458, 337–341 (2009). ArticleCAS Google Scholar
Li, Y. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nature Genet.42, 969–972 (2010). ArticleCAS Google Scholar
Durbin, R. M. et al. A map of human genome variation from population-scale sequencing. Nature467, 1061–1073 (2010). This 1000Genomes paper provides an application of many of the state-of-the-art methods for analysis of NGS data. ArticleCAS Google Scholar
Flicek, P. & Birney, E. Sense from sequence reads: methods for alignment and assembly. Nature Methods6, S6–S12 (2009). ArticleCAS Google Scholar
Kim, S. Y. et al. Design of association studies with pooled or un-pooled next-generation sequencing data. Genet. Epidemiol.34, 479–491 (2010). Article Google Scholar
Li, H., Ruan, J. & Durbin, R. M. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res.18, 1851–1858 (2008). This paper describes MAQ, a forerunner of efficient, hash-based alignment algorithms for short reads. MAQ also produces genotype calls. The concept of read-mapping quality is introduced in this paper. ArticleCAS Google Scholar
Li, J. B. et al. Multiplex padlock targeted sequencing reveal human hypermutable CpG variations. Genome Res.19, 1606–1615 (2009). Article Google Scholar
Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res.19, 1124–1132 (2009). ArticleCAS Google Scholar
Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics25, 1966–1967 (2009). ArticleCAS Google Scholar
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res.8, 186–194 (1998). ArticleCAS Google Scholar
Quinlan, A. R. et al. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nature Methods5, 179–181 (2008). ArticleCAS Google Scholar
Wu, H, Irizarry, R. A. & Bravo, H. C. Intensity normalization improves color calling in SOLiD sequencing. Nature Methods7, 336–337 (2010). ArticleCAS Google Scholar
Kircher, M., Stenzel, U. & Kelso, J. Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol.10, R83 (2009). Article Google Scholar
Kao, W. C., Stevens, K. & Song, Y. S. BayesCall: a model-based basecalling algorithm for high-throughput short-read sequencing. Genome Res.19, 1884–1895 (2009). ArticleCAS Google Scholar
Kao, W. C. & Song, Y. S. naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing. Lect. Notes Comp. Sci.6044, 233–247 (2010). Article Google Scholar
Burrows, M. & Wheeler, D. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation. HP Labs Technical Reports[online], (1994).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.10, R25 (2009). Article Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics25, 1754–1760 (2009). ArticleCAS Google Scholar
Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 27 Oct 2010 (doi:10.1101/gr.111120.110). Article Google Scholar
Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P. & Batzoglou, S. Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS ONE2, e484 (2007). Article Google Scholar
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res.18, 821–829 (2008). ArticleCAS Google Scholar
Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res.18, 810–820 (2008). ArticleCAS Google Scholar
Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res.19, 1117–1123 (2009). ArticleCAS Google Scholar
Chaisson, M. J. P., Brinza, D. & Pevzner, P. A. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res.19, 336–346 (2009). ArticleCAS Google Scholar
Brockman, W. et al. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res.18, 763–770 (2008). ArticleCAS Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res.20, 1297–1303 (2010). ArticleCAS Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 10 Apr 2011 (doi:10.1038/ng.806). ArticleCAS Google Scholar
Harismendy, O. et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol.10, R32 (2009). Article Google Scholar
Wang, J. et al. The diploid sequence of an Asian individual. Nature456, 60–65 (2009). Article Google Scholar
Hedges, D. et al. Exome sequencing of a multigenerational human pedigree. PLoS ONE4, e8232 (2009). Article Google Scholar
Martin, E. R. et al. SeqEM: an adaptive genotype-calling approach for next- generation sequencing studies. Bioinformatics26, 2803–2810 (2010). ArticleCAS Google Scholar
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res.29, 308–311 (2001). ArticleCAS Google Scholar
Dai, J. Y. et al. Imputation methods to improve inference in SNP association studies. Genet. Epidemiol.30, 690–702 (2006). Article Google Scholar
Minichiello, M. J. & Durbin, R. Mapping trait loci by use of inferred ancestral recombination graphs. Am. J. Hum. Genet.79, 910–922 (2006). ArticleCAS Google Scholar
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet.78, 629–644 (2006). ArticleCAS Google Scholar
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet.81, 1084–1097 (2007). ArticleCAS Google Scholar
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnely, P. A new multipoint method for genome-wide association studies via imputation of genotypes. Nature Genet.39, 906–913 (2007). ArticleCAS Google Scholar
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet.5, e1000529 (2009). Article Google Scholar
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev.Genet.11, 499–511 (2010). This Review provides a comprehensive overview of available statistical methods for imputing genotypes and discusses various uses of imputation. CAS Google Scholar
Huang, L. et al. The relationship between imputation error and statistical power in genetic association studies in diverse populations. Am. J. Hum. Genet.85, 692–698 (2009). ArticleCAS Google Scholar
Schaid, D. J., Rowland, C. M., Tines, D. E., Jacobson, R. M. & Poland, G. A. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum.Genet.70, 425–434 (2002). Google Scholar
Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate genes and quantitative traits. PLoSGenet.3, e114 (2007). Google Scholar
Hellmann, I. et al. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals. Genome Res.18, 1020–1029 (2008). ArticleCAS Google Scholar
Johnson, P. L. F. & Slatkin, M. Accounting for bias from sequencing error in population genetic estimates. Mol. Biol. Evol.25, 199–206 (2008). ArticleCAS Google Scholar
Johnson, P. L. F. & Slatkin, M. Inference of population genetic parameters in metagenomics. A clean look at messy data. Genome Res.16, 1320–1327 (2006). ArticleCAS Google Scholar
Yi, X. et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science329, 75–78 (2010). ArticleCAS Google Scholar
Li, H. et al. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics.25, 2078–2079 (2009). Article Google Scholar
Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 27 Oct 2010 (doi:10.1101/gr.113084.110). Article Google Scholar