Synthetic associations created by rare variants do not explain most GWAS results - PubMed (original) (raw)

Comment

Synthetic associations created by rare variants do not explain most GWAS results

Naomi R Wray et al. PLoS Biol. 2011.

No abstract available

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. LD between causal and genotyped SNPs and synthetic association.

SNPs 1–10 are independent SNPs in a short chromosomal region, with population frequencies indicated by the values in the box. Rare mutations tend to be younger than common mutations. A mutation event in the region creates causal variant C1. C1 has a higher probability of arising on the major allele (dark) of any SNP than the minor allele (light). However, in the absence of recombination, the highest associated SNP will be the one where C1 is coupled (see Box 2) with the SNP allele of lowest frequency, SNP 3; recombination between the SNP and the causal variant could break down this synthetic association. An independent mutation event in the region gives rise to a second causal SNP, C2. Again C2 has higher probability of arising on the major allele of each SNP. If C2 had been the only mutation in the region then SNP 10 would be the most highly associated, as the coupled allele has lowest frequency. However, when both events arise in the same region, the associations at SNPs 3 and 10 are partially masked as they carry risk variants on both their alleles. C1 and C2 arise on the same background allele for many SNPs, but SNP 8 has the allele of lowest frequency that harbours both risk alleles. In the absence of recombination, and depending on effect size, the highest association might be with SNP 8, rather than SNPs 3 or 10. Individuals are very unlikely to carry both C1 and C2. As more causal variants arise in the region, the most associated SNP will be the one with a detectable difference in the contribution to risk from the risk alleles harboured on each allele. Other representations of synthetic association could be viewed in parallel with this representation ,,.

Figure 2

Figure 2. Frequency distributions of a) the risk allele frequency of the most associated SNPs listed in the GWAS Catalog for the diseases in Table 3.

b) MAF of all SNPs simulated under the coalescence model, c) MAF of SNPs used in analyses to be representative of SNPs included in GWAS. d–f) Coupled allele of most associated SNP from simulations of 1, 9, or 36 causal variants in a 100 kb region.

Figure 3

Figure 3. Minimum fold increase in genetic variance at single rare causal locus given the frequency of the risk allele at the genotyped associated locus.

The minimum fold increase is calculated as 1/r 2, with r 2 calculated as the maximum r 2 given the frequency of the trait increasing allele at the genotyped SNP and the frequency of the causal allele (see Box 2).

Figure 4

Figure 4. Polygenic analyses following the International Schizophrenia Consortium .

a) The original results for polygenic score analysis in the ISC, when stratified by quintile of risk-increasing allele frequency (Q1 being the lowest risk-increasing allele frequency, Q5 the most common; the range is between 0.02 and 0.98). b) We repeated these analyses on simulated data, generated under a “rare variant only” model and using the same simulation procedure as Dickson et al., assuming that risk loci harbor 9 causal variants, GRR = 4, MAF 0.005–0.02). The pile-up of signal in the lower quintiles, which is expected under Dickson et al.'s model, is clearly not consistent with the observed ISC results. In the simulations, SNPs are generated through a coalescent process; a subset of SNPs is selected as “genotyped” to represent the marker density, frequency distribution and LD profile observed in the original ISC study (which has properties that are typical of most GWAS, including the under-representation of low frequency variants). The y axis is the –log10P from the logistic regression of case-control status on profile score in an independent “target” case-control sample using a score calculated as the number of alleles identified as associated (with p-value less than a threshold pT) in the discovery case-control sample association analysis, scaled within each figure as so that the maximum value observed for five significance thresholds (pT = 0.1, 0.2, 0.3, 0.4, and 0.5, plotted left to right in each quintile) is scaled to 1 and the minimum is scaled to zero.

Comment in

Comment on

Similar articles

Cited by

References

    1. Hindorff L. A, Junkins H. A, Hall P. N, Mehta J. P, Manolio T. A. 2009. A catalog of published genome-wide association studies. Available: http://www.genome.gov/gwastudies. Accessed 7 December 2010. - PubMed
    1. Maher B. Personal genomes: the case of the missing heritability. Nature. 2008;456:18–21. - PubMed
    1. Manolio T. A, Collins F. S, Cox N. J, Goldstein D. B, Hindorff L. A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed
    1. Dickson S. P, Wang K, Krantz I, Hakonarson H, Goldstein D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 2011;9:e1001008. doi: 10.1371/journal.pbio.1001008. - DOI - PMC - PubMed
    1. Wang K, Dickson S. P, Stolle C. A, Krantz I. D, Goldstein D. B, et al. Interpretation of association signals and identification of causal variants from genome-wide association studies. Am J of Hum Genet. 2010;86:730–742. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources