Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip - PubMed (original) (raw)
Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip
Chris C A Spencer et al. PLoS Genet. 2009 May.
Abstract
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. Schematic of how power is estimated.
At the top of the figure is the recombination map and haplotypes estimated from the HapMap project . Using this population genetic information we simulate a case-control sample (grey lines) where the red dots indicate the disease locus and blue dots indicate linked genetic variation. By performing a test of association at each SNP on the genotyping chip we can estimate power by counting the number of simulation for which a test statistic exceed a significance threshold (dotted line). We compare genotyping chips by changing the set of SNP at which we carry out a test. See Methods.
Figure 2. Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis).
From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association and for simulations where the risk allele frequency of the causal allele is >0.05. The top row shows power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. The bottom row relates to case-control studies simulated from the YRI HapMap panel.
Figure 3. Power for Common versus Rare alleles.
Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis). From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association. The top two rows show the power for rare risk alleles (RAF<0.1) and the bottom two rows show the power for common risk alleles (RAF>0.1). Rows 1 and 3 show power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. Rows 2 and 4 relate to case-control studies simulated from the YRI HapMap panel.
Figure 4. Histograms of the proportion of SNPs in the 22 1Mb regions (see Methods) in HapMap Phase II for which the maximum r2 with a SNP on the genotyping chip in in one of eleven bins (increasing in correlation (LD) from left to right).
The same histograms are coloured in two ways. The top row shows in red the percentage of the SNPs in each bin detected (See Methods and text) when selected to be the causal SNP in our simulations (the proportion of the total volume of the bars coloured red is therefore an estimate of power). In the bottom row all r2 bins above 0.8 are coloured red (the proportion of the total volume of all the bars is therefore an estimate of coverage). Note that the use of HapMap data in choosing SNPs for the Illumina chip leads to a higher proportion of SNPs in high r2 bins.
Similar articles
- Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.
Hao K, Chudin E, McElwee J, Schadt EE. Hao K, et al. BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27. BMC Genet. 2009. PMID: 19531258 Free PMC article. - Accuracy of genotype imputation in Nelore cattle.
Carvalheiro R, Boison SA, Neves HH, Sargolzaei M, Schenkel FS, Utsunomiya YT, O'Brien AM, Sölkner J, McEwan JC, Van Tassell CP, Sonstegard TS, Garcia JF. Carvalheiro R, et al. Genet Sel Evol. 2014 Oct 10;46(1):69. doi: 10.1186/s12711-014-0069-1. Genet Sel Evol. 2014. PMID: 25927950 Free PMC article. - Comparison of the performance of two commercial genome-wide association study genotyping platforms in Han Chinese samples.
Jiang L, Willner D, Danoy P, Xu H, Brown MA. Jiang L, et al. G3 (Bethesda). 2013 Jan;3(1):23-9. doi: 10.1534/g3.112.004069. Epub 2013 Jan 1. G3 (Bethesda). 2013. PMID: 23316436 Free PMC article. - Genotype Imputation in Genome-Wide Association Studies.
Naj AC. Naj AC. Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review. - Accurate Imputation of Untyped Variants from Deep Sequencing Data.
Torkamaneh D, Belzile F. Torkamaneh D, et al. Methods Mol Biol. 2021;2243:271-281. doi: 10.1007/978-1-0716-1103-6_13. Methods Mol Biol. 2021. PMID: 33606262 Review.
Cited by
- Evaluation of transethnic fine mapping with population-specific and cosmopolitan imputation reference panels in diverse Asian populations.
Wang X, Cheng CY, Liao J, Sim X, Liu J, Chia KS, Tai ES, Little P, Khor CC, Aung T, Wong TY, Teo YY. Wang X, et al. Eur J Hum Genet. 2016 Apr;24(4):592-9. doi: 10.1038/ejhg.2015.150. Epub 2015 Jul 1. Eur J Hum Genet. 2016. PMID: 26130488 Free PMC article. - Breeding for resistance to gastrointestinal nematodes - the potential in low-input/output small ruminant production systems.
Zvinorova PI, Halimani TE, Muchadeyi FC, Matika O, Riggio V, Dzama K. Zvinorova PI, et al. Vet Parasitol. 2016 Jul 30;225:19-28. doi: 10.1016/j.vetpar.2016.05.015. Epub 2016 May 13. Vet Parasitol. 2016. PMID: 27369571 Free PMC article. Review. - Genome-wide association study identified genes associated with ammonia nitrogen tolerance in Litopenaeus vannamei.
Fu S, Liu J. Fu S, et al. Front Genet. 2022 Aug 22;13:961009. doi: 10.3389/fgene.2022.961009. eCollection 2022. Front Genet. 2022. PMID: 36072655 Free PMC article. - Forward-time simulation of realistic samples for genome-wide association studies.
Peng B, Amos CI. Peng B, et al. BMC Bioinformatics. 2010 Sep 1;11:442. doi: 10.1186/1471-2105-11-442. BMC Bioinformatics. 2010. PMID: 20809983 Free PMC article. - Enhancing the power to detect low-frequency variants in genome-wide screens.
Lin CY, Xing G, Ku HC, Elston RC, Xing C. Lin CY, et al. Genetics. 2014 Apr;196(4):1293-302. doi: 10.1534/genetics.113.160739. Epub 2014 Feb 4. Genetics. 2014. PMID: 24496013 Free PMC article.
References
- Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–7. - PubMed
- Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6:95–108. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials