Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip - PubMed (original) (raw)
Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip
Chris C A Spencer et al. PLoS Genet. 2009 May.
Abstract
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. Schematic of how power is estimated.
At the top of the figure is the recombination map and haplotypes estimated from the HapMap project . Using this population genetic information we simulate a case-control sample (grey lines) where the red dots indicate the disease locus and blue dots indicate linked genetic variation. By performing a test of association at each SNP on the genotyping chip we can estimate power by counting the number of simulation for which a test statistic exceed a significance threshold (dotted line). We compare genotyping chips by changing the set of SNP at which we carry out a test. See Methods.
Figure 2. Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis).
From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association and for simulations where the risk allele frequency of the causal allele is >0.05. The top row shows power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. The bottom row relates to case-control studies simulated from the YRI HapMap panel.
Figure 3. Power for Common versus Rare alleles.
Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis). From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association. The top two rows show the power for rare risk alleles (RAF<0.1) and the bottom two rows show the power for common risk alleles (RAF>0.1). Rows 1 and 3 show power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. Rows 2 and 4 relate to case-control studies simulated from the YRI HapMap panel.
Figure 4. Histograms of the proportion of SNPs in the 22 1Mb regions (see Methods) in HapMap Phase II for which the maximum r2 with a SNP on the genotyping chip in in one of eleven bins (increasing in correlation (LD) from left to right).
The same histograms are coloured in two ways. The top row shows in red the percentage of the SNPs in each bin detected (See Methods and text) when selected to be the causal SNP in our simulations (the proportion of the total volume of the bars coloured red is therefore an estimate of power). In the bottom row all r2 bins above 0.8 are coloured red (the proportion of the total volume of all the bars is therefore an estimate of coverage). Note that the use of HapMap data in choosing SNPs for the Illumina chip leads to a higher proportion of SNPs in high r2 bins.
Similar articles
- Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.
Hao K, Chudin E, McElwee J, Schadt EE. Hao K, et al. BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27. BMC Genet. 2009. PMID: 19531258 Free PMC article. - Accuracy of genotype imputation in Nelore cattle.
Carvalheiro R, Boison SA, Neves HH, Sargolzaei M, Schenkel FS, Utsunomiya YT, O'Brien AM, Sölkner J, McEwan JC, Van Tassell CP, Sonstegard TS, Garcia JF. Carvalheiro R, et al. Genet Sel Evol. 2014 Oct 10;46(1):69. doi: 10.1186/s12711-014-0069-1. Genet Sel Evol. 2014. PMID: 25927950 Free PMC article. - Comparison of the performance of two commercial genome-wide association study genotyping platforms in Han Chinese samples.
Jiang L, Willner D, Danoy P, Xu H, Brown MA. Jiang L, et al. G3 (Bethesda). 2013 Jan;3(1):23-9. doi: 10.1534/g3.112.004069. Epub 2013 Jan 1. G3 (Bethesda). 2013. PMID: 23316436 Free PMC article. - Genotype Imputation in Genome-Wide Association Studies.
Naj AC. Naj AC. Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review. - Accurate Imputation of Untyped Variants from Deep Sequencing Data.
Torkamaneh D, Belzile F. Torkamaneh D, et al. Methods Mol Biol. 2021;2243:271-281. doi: 10.1007/978-1-0716-1103-6_13. Methods Mol Biol. 2021. PMID: 33606262 Review.
Cited by
- Genome-wide pathway association studies of multiple correlated quantitative phenotypes using principle component analyses.
Zhang F, Guo X, Wu S, Han J, Liu Y, Shen H, Deng HW. Zhang F, et al. PLoS One. 2012;7(12):e53320. doi: 10.1371/journal.pone.0053320. Epub 2012 Dec 28. PLoS One. 2012. PMID: 23285279 Free PMC article. - Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.
Johnson EO, Hancock DB, Levy JL, Gaddis NC, Saccone NL, Bierut LJ, Page GP. Johnson EO, et al. Hum Genet. 2013 May;132(5):509-22. doi: 10.1007/s00439-013-1266-7. Epub 2013 Jan 22. Hum Genet. 2013. PMID: 23334152 Free PMC article. - Intergenerational continuity in parents' and adolescents' externalizing problems: The role of life events and their interaction with GABRA2.
Salvatore JE, Meyers JL, Yan J, Aliev F, Lansford JE, Pettit GS, Bates JE, Dodge KA, Rose RJ, Pulkkinen L, Kaprio J, Dick DM. Salvatore JE, et al. J Abnorm Psychol. 2015 Aug;124(3):709-28. doi: 10.1037/abn0000066. J Abnorm Psychol. 2015. PMID: 26075969 Free PMC article. - Power and sample size calculations for SNP association studies with censored time-to-event outcomes.
Owzar K, Li Z, Cox N, Jung SH. Owzar K, et al. Genet Epidemiol. 2012 Sep;36(6):538-48. doi: 10.1002/gepi.21645. Epub 2012 Jun 8. Genet Epidemiol. 2012. PMID: 22685040 Free PMC article. - Association analysis for udder health based on SNP-panel and sequence data in Danish Holsteins.
Wu X, Lund MS, Sahana G, Guldbrandtsen B, Sun D, Zhang Q, Su G. Wu X, et al. Genet Sel Evol. 2015 Jun 19;47(1):50. doi: 10.1186/s12711-015-0129-1. Genet Sel Evol. 2015. PMID: 26087655 Free PMC article.
References
- Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–7. - PubMed
- Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6:95–108. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials