Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip - PubMed (original) (raw)

Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip

Chris C A Spencer et al. PLoS Genet. 2009 May.

Abstract

Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Schematic of how power is estimated.

At the top of the figure is the recombination map and haplotypes estimated from the HapMap project . Using this population genetic information we simulate a case-control sample (grey lines) where the red dots indicate the disease locus and blue dots indicate linked genetic variation. By performing a test of association at each SNP on the genotyping chip we can estimate power by counting the number of simulation for which a test statistic exceed a significance threshold (dotted line). We compare genotyping chips by changing the set of SNP at which we carry out a test. See Methods.

Figure 2

Figure 2. Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis).

From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association and for simulations where the risk allele frequency of the causal allele is >0.05. The top row shows power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. The bottom row relates to case-control studies simulated from the YRI HapMap panel.

Figure 3

Figure 3. Power for Common versus Rare alleles.

Plots of power (solid lines) and coverage (dotted line) for increasing sample sizes of cases and controls (x-axis). From left to right plots are given for increasing effect sizes (relative risk per allele). Both power and coverage range from 0 to 1 and are given on the y-axis. Results are for single-marker test of association. The top two rows show the power for rare risk alleles (RAF<0.1) and the bottom two rows show the power for common risk alleles (RAF>0.1). Rows 1 and 3 show power for case-control studies simulated in a Caucasian population based on the CEU HapMap panel. Rows 2 and 4 relate to case-control studies simulated from the YRI HapMap panel.

Figure 4

Figure 4. Histograms of the proportion of SNPs in the 22 1Mb regions (see Methods) in HapMap Phase II for which the maximum r2 with a SNP on the genotyping chip in in one of eleven bins (increasing in correlation (LD) from left to right).

The same histograms are coloured in two ways. The top row shows in red the percentage of the SNPs in each bin detected (See Methods and text) when selected to be the causal SNP in our simulations (the proportion of the total volume of the bars coloured red is therefore an estimate of power). In the bottom row all r2 bins above 0.8 are coloured red (the proportion of the total volume of all the bars is therefore an estimate of coverage). Note that the use of HapMap data in choosing SNPs for the Illumina chip leads to a higher proportion of SNPs in high r2 bins.

Similar articles

Cited by

References

    1. The International HapMap Consortium. A Haplotype Map of the Human Genome. Nature. 2005;437:1299–320. - PMC - PubMed
    1. The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–7. - PubMed
    1. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6:95–108. - PubMed
    1. The Wellcome Trust Case-Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources