Genotype imputation for genome-wide association studies (original) (raw)
Frazer, K., Ballinger, D., Cox, D., Hinds, D., Stuve, L. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature449, 851–861 (2007). ArticleCASPubMed Google Scholar
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet.39, 906–913 (2007). ArticleCASPubMed Google Scholar
Stephens, M. & Donnelly, P. Inference in molecular population genetics. J. R. Statist. Soc. B62, 605–635 (2000). Article Google Scholar
Fearnhead, P. & Donnelly, P. Estimating recombination rates from population genetic data. Genetics159, 1299–1318 (2001). CASPubMedPubMed Central Google Scholar
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics165, 2213–2233 (2003). CASPubMedPubMed Central Google Scholar
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE77, 257–286 (1989). Article Google Scholar
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet.5, e1000529 (2009). This paper describes the IMPUTE v2 method and carries out a comprehensive evaluation of several methods. This reference should be read as the follow-on from Reference 2, which describes IMPUTE v1. ArticlePubMedPubMed Central Google Scholar
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet.78, 629–644 (2006). ArticleCASPubMedPubMed Central Google Scholar
Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet.3, e114 (2007). The paper that describes the BIMBAM method for Bayesian multi-SNP and single SNP analysis using imputed data. Should be read together with Reference 8, which describes fastPHASE. ArticlePubMedPubMed Central Google Scholar
Kennedy, J., Mandoiu, I. & Pasaniuc, B. Genotype error detection using hidden Markov models of haplotype diversity. J. Comput. Biol.15, 1155–1171 (2008). ArticleCASPubMed Google Scholar
Browning, S. & Browning, B. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet.81, 1084–1097 (2007). ArticleCASPubMedPubMed Central Google Scholar
Browning, B. & Browning, S. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet.84, 210–223 (2009). ArticleCASPubMedPubMed Central Google Scholar
Browning, S. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet.124 439–450 (2008). References 12–15 are a series of papers that describe the model underlying the BEAGLE method. ArticleCASPubMedPubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet.81, 559–575 (2007). ArticleCASPubMedPubMed Central Google Scholar
Lin, D., Hu, Y. & Huang, B. Simple and efficient analysis of disease association with missing genotype data. Am. J. Hum. Genet.82, 444–452 (2008). ArticleCASPubMedPubMed Central Google Scholar
Nicolae, D. Testing untyped alleles (TUNA)-applications to genome-wide association studies. Genet. Epidemiol.30, 718–727 (2006). ArticlePubMed Google Scholar
Johnson, G. et al. Haplotype tagging for the identification of common disease genes. Nature Genet.29, 233–237 (2001). ArticleCASPubMed Google Scholar
Evans, D., Cardon, L. & Morris, A. Genotype prediction using a dense map of SNPs. Genet. Epidemiol.27, 375–384 (2004). ArticlePubMed Google Scholar
De Bakker, P. et al. Efficiency and power in genetic association studies. Nature Genet.37, 1217–1223 (2005). ArticleCASPubMed Google Scholar
Excoffier, L. & Slatkin, M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol.12, 921–927 (1995). CASPubMed Google Scholar
Pastorino, R. et al. Association between protective and deleterious HLA alleles with multiple sclerosis in Central East Sardinia. PLoS ONE4, e6526 (2009). ArticlePubMedPubMed Central Google Scholar
Burdick, J., Chen, W., Abecasis, G. & Cheung, V. In silico method for inferring genotypes in pedigrees. Nature Genet.38, 1002–1004 (2006). ArticleCASPubMed Google Scholar
Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet.40, 1068 –1075 (2008).
Spencer, C. C. A., Su, Z., Donnelly, P. & Marchini, J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet.5, e1000477 (2009). ArticlePubMedPubMed Central Google Scholar
Pei, Y., Li, J., Zhang, L., Papasian, C. & Deng, H. Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE3, e3551 (2008). ArticlePubMedPubMed Central Google Scholar
Hao, K., Chudin, E., McElwee, J. & Schadt, E. E. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies. BMC Genet.10, 27 (2009). ArticlePubMedPubMed Central Google Scholar
Huang, L., Li, Y., Singleton, A., Hardy, J., Abecasis, G. et al. Genotype-imputation accuracy across worldwide human populations. Am. J. Hum. Genet.84, 235–250 (2009). A useful reference that illustrates the performance of imputation in a range worldwide human populations when using the HapMap 2 reference panels. ArticleCASPubMedPubMed Central Google Scholar
Pasaniuc, B., Sankararaman, S., Kimmel, G. & Halperin, E. Inference of locus-specific ancestry in closely related populations. Bioinformatics25, 213–221 (2009). Article Google Scholar
Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genet.40, 638–645 (2008). ArticleCASPubMed Google Scholar
Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science316, 1336–1341 (2007). One of the earliest examples of the use of imputation in meta-analysis. This paper combined three GWA studies and was able to identify several novel associations. ArticleCASPubMedPubMed Central Google Scholar
Lindgren, C. M. et al. Genome-wide association scan meta-analysis identifies three loci influencing adiposity and fat distribution. PLoS Genet.5, e1000508 (2009). ArticlePubMedPubMed Central Google Scholar
Wakefield, J. Bayes factors for genome-wide association studies: comparison with _p_-values. Genet. Epidemiol.33, 79–86 (2009). ArticlePubMed Google Scholar
Stephens, M. & Balding, D. Bayesian statistical methods for genetic association studies. Nature Rev. Genet.10, 681–690 (2009). An excellent Review on the subject of using Bayesian statistical methods in association studies with a particular focus on the calculation, choice of priors and the interpretation of single SNP Bayes factors. ArticleCASPubMed Google Scholar
Stephens, M., Smith, N. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet.68, 978–989 (2001). ArticleCASPubMedPubMed Central Google Scholar
Carlson, C. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet.74, 106–120 (2004). ArticleCASPubMed Google Scholar
Elston, R. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum. Hered.21, 523–542 (1971). ArticleCASPubMed Google Scholar
Cooper, J. et al. Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nature Genet.40, 1399–1401 (2008). ArticleCASPubMed Google Scholar
Houlston, R. et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nature Genet.40, 1426–1435 (2008). ArticleCASPubMed Google Scholar
De Jager, P. et al. Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nature Genet.41, 776–82 (2009). ArticleCASPubMed Google Scholar
Loos, R. J. F. et al. Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nature Genet.40, 768–75 (2008). ArticleCASPubMed Google Scholar
de Bakker, P. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet.17, R122–R128 (2008). ArticleCASPubMedPubMed Central Google Scholar
Zollner, S. & Pritchard, J. Coalescent-based association mapping and fine mapping of complex trait loci. Genetics169, 1071–1092 (2005). ArticleCASPubMedPubMed Central Google Scholar
Minichiello, M. & Durbin, R. Mapping trait loci by use of inferred ancestral recombination graphs. Am. J. Hum. Genet.79, 910–922 (2006). ArticleCASPubMedPubMed Central Google Scholar
Su, Z., Cardin, N., Wellcome Trust Case Control Consortium, Donnelly, P. & Marchini, J. A Bayesian method for detecting and characterizing allelic heterogeneity and boosting signals in genome-wide association studies. Stat. Sci.24, 430–450 (2009). Article Google Scholar
Browning, B. & Browning, S. Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet. Epidemiol.31, 365–375 (2007). ArticlePubMed Google Scholar
Leslie, S., Donnelly, P. & McVean, G. A statistical method for predicting classical HLA alleles from SNP data. Am. J. Hum. Genet.82, 48–56 (2008). ArticleCASPubMedPubMed Central Google Scholar
Browning, B. L. & Yu, Z. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am. J. Hum. Genet.85, 847–861 (2009). ArticleCASPubMedPubMed Central Google Scholar
Louis, T. A. Finding the observed information matrix when using the EM algorithm. J.Royal Stat. Soc.B44, 226–233. Google Scholar
Little, R. J. A. & Rubin, D. B. Statistical Analysis with Missing Data 2nd edn (Wiley, Hoboken,2002). Book Google Scholar
Liu, J. Z. et al. (2010) Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nature Genet.42, 436–440 (2010). ArticleCASPubMed Google Scholar