Practical issues in imputation-based association mapping - PubMed (original) (raw)

Practical issues in imputation-based association mapping

Yongtao Guan et al. PLoS Genet. 2008 Dec.

Abstract

Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption--specifically, that difficult-to-impute SNPs tend to have larger effects--and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate--their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Graph showing the trade-off between call-rate and error rate, as the probability threshold for calling an imputed genotype is varied.

Numbers along the line indicate the thresholds that produce the corresponding call rate and error rate; small black points indicate results for intermediate thresholds in increments of 0.02. For example, if we call only those imputed genotypes assigned probability >0.9 of being correct, then approximately 74% of imputed genotypes are called, and of these called genotypes approximately 1% are incorrect.

Figure 2

Figure 2. Graphs showing correspondence of BFmean (_x_-axis) vs BFIS (_y_-axis on left panel) and BFnaive (_y_-axis on right panel).

In each case the diagonal blue line is y = x, and the vertical red lines indicate ±2 standard errors of the Bayes factor estimate (for example, on the left they run between log10(BFIS±2 standard errors)).

Figure 3

Figure 3. Graph showing correspondance between log10(BFmean) values and Likelihood ratio statistics Λ (left) and Λmean (right).

Each point represents one SNP-phenotype combination, colored according to average confidence in imputed genotypes. Specifically, the SNPs were colored according to the value of r, defined to be the ratio of the variance of the (posterior mean) genotypes and the expected variance if the SNP were typed, calculated from the MAF assuming Hardy-Weinberg equilibrium (2_f_(1−f)). This scaling ensures that for typed and confidently-imputed SNPs _r_≈1, whereas for SNPs with low average confidence r will be close to 0. Colors indicate ranges of values of r: red _r_∈(0,0.001]; green _r_∈(0.001,0.01]; blue _r_∈(0.01,0.1]; cyan _r_∈(0.1,0.5]; black _r_>0.5.

Figure 4

Figure 4. Effects of different test statistics on power to detect associations.

Each line shows the trade-off between true and false discoveries when using Bayes factors (black lines) or likelihood ratio test statistics Λ (red solid lines) and Λmean (red dashed lines), as threshold for declaring an association is varied. In each setting the Bayes factor produces as good, or better, performance than the likelihood ratio test (black lines are above the corresponding red lines). Best performance is obtained using CEU panel, which is well-matched to the sample and produces a low imputation error rate (left; imputation error rate 6.2%). Larger increases in imputation error rate, obtained when using YRI panel that is not well-matched to sample, produce a notable reduction in performance (right; imputation error rate 25%). However, even with a high imputation error rate, using the Bayes factor as a test statistic gives better results than no imputation (blue dotted lines in both panel).

Figure 5

Figure 5. The x-axis is log10 (BF) of linear model and y-axis is log10 BF of logistic model.

The blue line is x = y.

Similar articles

Cited by

References

    1. The International HapMap Consortium. A second generation human haplotype map of over 3.1 million snps. Nature. 2007;449:7164:851–861. - PMC - PubMed
    1. Servin B, Stephens M. Efficient multipoint analysis of association studies: candidate regions and quantitative traits. PLoS Genetics. 2007;3 - PMC - PubMed
    1. Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–913. - PubMed
    1. Lin DY, Hu Y, Huang BE. Simple and efficient analysis of disease association with missing genotype data. Am J Hum Genet. 2008;82:444–452. - PMC - PubMed
    1. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, et al. A genome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants. Science. 2007;316:1341–1345. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources