Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip - PubMed (original) (raw)
Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip
Chris C A Spencer et al. PLoS Genet. 2009 May.
Abstract
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. Schematic of how power is estimated.
At the top of the figure is the recombination map and haplotypes estimated from the HapMap project . Using this population genetic information we simulate a case-control sample (grey lines) where the red dots indicate the disease locus and blue dots indicate linked genetic variation. By performing a test of association at each SNP on the genotyping chip we can estimate power by counting the number of simulation for which a test statistic exceed a significance threshold (dotted line). We compare genotyping chips by changing the set of SNP at which we carry out a test. See Methods.
Figure 2. Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis).
From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association and for simulations where the risk allele frequency of the causal allele is >0.05. The top row shows power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. The bottom row relates to case-control studies simulated from the YRI HapMap panel.
Figure 3. Power for Common versus Rare alleles.
Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis). From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association. The top two rows show the power for rare risk alleles (RAF<0.1) and the bottom two rows show the power for common risk alleles (RAF>0.1). Rows 1 and 3 show power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. Rows 2 and 4 relate to case-control studies simulated from the YRI HapMap panel.
Figure 4. Histograms of the proportion of SNPs in the 22 1Mb regions (see Methods) in HapMap Phase II for which the maximum r2 with a SNP on the genotyping chip in in one of eleven bins (increasing in correlation (LD) from left to right).
The same histograms are coloured in two ways. The top row shows in red the percentage of the SNPs in each bin detected (See Methods and text) when selected to be the causal SNP in our simulations (the proportion of the total volume of the bars coloured red is therefore an estimate of power). In the bottom row all r2 bins above 0.8 are coloured red (the proportion of the total volume of all the bars is therefore an estimate of coverage). Note that the use of HapMap data in choosing SNPs for the Illumina chip leads to a higher proportion of SNPs in high r2 bins.
Similar articles
- Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.
Hao K, Chudin E, McElwee J, Schadt EE. Hao K, et al. BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27. BMC Genet. 2009. PMID: 19531258 Free PMC article. - Accuracy of genotype imputation in Nelore cattle.
Carvalheiro R, Boison SA, Neves HH, Sargolzaei M, Schenkel FS, Utsunomiya YT, O'Brien AM, Sölkner J, McEwan JC, Van Tassell CP, Sonstegard TS, Garcia JF. Carvalheiro R, et al. Genet Sel Evol. 2014 Oct 10;46(1):69. doi: 10.1186/s12711-014-0069-1. Genet Sel Evol. 2014. PMID: 25927950 Free PMC article. - Comparison of the performance of two commercial genome-wide association study genotyping platforms in Han Chinese samples.
Jiang L, Willner D, Danoy P, Xu H, Brown MA. Jiang L, et al. G3 (Bethesda). 2013 Jan;3(1):23-9. doi: 10.1534/g3.112.004069. Epub 2013 Jan 1. G3 (Bethesda). 2013. PMID: 23316436 Free PMC article. - Genotype Imputation in Genome-Wide Association Studies.
Naj AC. Naj AC. Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review. - Accurate Imputation of Untyped Variants from Deep Sequencing Data.
Torkamaneh D, Belzile F. Torkamaneh D, et al. Methods Mol Biol. 2021;2243:271-281. doi: 10.1007/978-1-0716-1103-6_13. Methods Mol Biol. 2021. PMID: 33606262 Review.
Cited by
- A resampling-based approach to share reference panels.
Cavinato T, Rubinacci S, Malaspinas AS, Delaneau O. Cavinato T, et al. Nat Comput Sci. 2024 May;4(5):360-366. doi: 10.1038/s43588-024-00630-7. Epub 2024 May 14. Nat Comput Sci. 2024. PMID: 38745108 Free PMC article. - Genetic Background of Blood β-Hydroxybutyrate Acid Concentrations in Early-Lactating Holstein Dairy Cows Based on Genome-Wide Association Analyses.
Wang Y, Wang Z, Liu W, Xie S, Ren X, Yan L, Liang D, Gao T, Fu T, Zhang Z, Huang H. Wang Y, et al. Genes (Basel). 2024 Mar 26;15(4):412. doi: 10.3390/genes15040412. Genes (Basel). 2024. PMID: 38674346 Free PMC article. - Susceptibility to Treatment-Resistant Depression Within Families.
Cheng CM, Chen MH, Tsai SJ, Chang WH, Tsai CF, Lin WC, Bai YM, Su TP, Chen TJ, Li CT. Cheng CM, et al. JAMA Psychiatry. 2024 Jul 1;81(7):663-672. doi: 10.1001/jamapsychiatry.2024.0378. JAMA Psychiatry. 2024. PMID: 38568605 - The Tofu mutation restores female fertility to Drosophila with a null BEAF mutation.
McKowen JK, Dassanayake M, Hart CM. McKowen JK, et al. bioRxiv [Preprint]. 2024 Feb 15:2024.02.13.580197. doi: 10.1101/2024.02.13.580197. bioRxiv. 2024. PMID: 38405992 Free PMC article. Preprint. - NPC1L1 rs217434 A > G as a Novel Single Nucleotide Polymorphism Related to Dyslipidemia in a Korean Population.
Cho D, Huang X, Han Y, Kim M. Cho D, et al. Biochem Genet. 2024 Oct;62(5):4103-4119. doi: 10.1007/s10528-023-10649-6. Epub 2024 Jan 27. Biochem Genet. 2024. PMID: 38280151
References
- Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–7. - PubMed
- Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6:95–108. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials