Quantifying Missing Heritability at Known GWAS Loci (original) (raw)

Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study

Genetic Epidemiology, 2011

Genome-wide association studies (GWAS) have become increasingly popular recently and contributed to the discovery of many susceptibility variants. However, a large proportion of the heritability still remained unexplained. This observation raises queries regarding the ability of GWAS to uncover the genetic basis of complex diseases. In this study, we propose a simple and fast statistical framework to estimate the total heritability explained by all true susceptibility variants in a GWAS. It is expected that many true risk variants will not be detected in a GWAS due to limited power. The proposed framework aims at recovering the ''hidden'' heritability. Importantly, only the summary z-statistics are required as input and no raw genotype data are needed. The strategy is to recover the true effect sizes from the observed z-statistics. The methodology does not rely on any distributional assumptions of the effect sizes of variants. Both binary and quantitative traits can be handled and covariates may be included. Population-based or family-based designs are allowed as long as the summary statistics are available. Simulations were conducted and showed satisfactory performance of the proposed approach. Application to real data (Crohn's disease, HDL, LDL, and triglycerides) reveals that at least around 10-20% of variance in liability or phenotype can be explained by GWAS panels. This translates to around 10-40% of the total heritability for the studied traits. Genet. Epidemiol. 2011.

Finding the missing heritability of complex diseases

Nature, 2009

Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and ...

Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases

Genetic Epidemiology, 2011

Recently, an increasing number of susceptibility variants have been identified for complex diseases. At the same time, the concern of ''missing heritability'' has also emerged. There is however no unified way to assess the heritability explained by individual genetic variants for binary outcomes. A systemic and quantitative assessment of the degree of ''missing heritability'' for complex diseases is lacking. In this study, we measure the variance in liability explained by individual variants, which can be directly interpreted as the locus-specific heritability. The method is extended to deal with haplotypes, multi-allelic markers, multi-locus genotypes, and markers in linkage disequilibrium. Methods to estimate the standard error and confidence interval are proposed. To assess our current level of understanding of the genetic basis of complex diseases, we conducted a survey of 10 diseases, evaluating the total variance explained by the known variants. The diseases under evaluation included Alzheimer's disease, bipolar disorder, breast cancer, coronary artery disease, Crohn's disease, prostate cancer, schizophrenia, systemic lupus erythematosus (SLE), type 1 diabetes and type 2 diabetes. The median total variance explained across the 10 diseases was 9.81%, while the median variance explained per associated SNP was around 0.25%. Our results suggest that a substantial proportion of heritability remains unexplained for the diseases under study. Programs to implement the methodologies described in this paper are available at

Using genome-wide complex trait analysis to quantify 'missing heritability

2012

results to produce robust heritability estimates for PD types across cohorts. Our results identify 27% (95% CI 17-38, P 5 8.08E 2 08) phenotypic variance associated with all types of PD, 15% (95% CI 20.2 to 33, P 5 0.09) phenotypic variance associated with early-onset PD and 31% (95% CI 17-44, P 5 1.34E 2 05) phenotypic variance associated with late-onset PD. This is a substantial increase from the genetic variance identified by top GWAS hits alone (between 3 and 5%) and indicates there are substantially more risk loci to be identified. Our results suggest that although GWASs are a useful tool in identifying the most common variants associated with complex disease, a great deal of common variants of small effect remain to be discovered.

Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases

American journal of human genetics, 2014

Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg(2)) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg(2) from imputed SNPs (5.1× enrichment; p = 3.7 × 10(-17)) and 38% (SE = 4%) of hg(2) from genotyped SNPs (1.6× enrichment, p = 1...

All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

PLoS Genetics, 2013

Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 12FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.

Effect of population stratification on the identification of significant single-nucleotide polymorphisms in genome-wide association studies

2009

The North American Rheumatoid Arthritis Consortium case-control study collected case participants across the United States and control participants from New York. More than 500,000 single-nucleotide polymorphisms (SNPs) were genotyped in the sample of 2000 cases and controls. Careful adjustment for the confounding effect of population stratification must be conducted when analyzing these data; the variance inflation factor (VIF) without adjustment is 1.44. In the primary analyses of these data, a clustering algorithm in the program PLINK was used to reduce the VIF to 1.14, after which genomic control was used to control residual confounding. Here we use stratification scores to achieve a unified and coherent control for confounding. We used the first 10 principal components, calculated genome-wide using a set of 81,500 loci that had been selected to have low pair-wise linkage disequilibrium, as risk factors in a logistic model to calculate the stratification score. We then divided the data into five strata based on quantiles of the stratification score. The VIF of these stratified data is 1.04, indicating substantial control of stratification. However, after control for stratification, we find that there are no significant loci associated with rheumatoid arthritis outside of the HLA region. In particular, we find no evidence for association of TRAF1-C5 with rheumatoid arthritis.

Effects of covariates and interactions on a genome-wide association analysis of rheumatoid arthritis

BMC Proceedings, 2009

While genetic and environmental factors and their interactions influence susceptibility to rheumatoid arthritis (RA), causative genetic variants have not been identified. The purpose of the present study was to assess the effects of covariates and genotype × sex interactions on the genome-wide association analysis (GWAA) of RA using Genetic Analysis Workshop 16 Problem 1 data and a logistic regression approach as implemented in PLINK. After accounting for the effects of population stratification, effects of covariates and genotype × sex interactions on the GWAA of RA were assessed by conducting association and interaction analyses. We found significant allelic associations, covariate, and genotype × sex interaction effects on RA. Several top single-nucleotide polymorphisms (SNPs) (~22 SNPs) showed significant associations with strong p-values (p < 1 × 10-4-p < 1 × 10-24). Only three SNPs on chromosomes 4, 13, and 20 were significant after Bonferroni correction, and none of these three SNPs showed significant genotype × sex interactions. Of the 30 top SNPs with significant (p < 1 × 10-4-p < 1 × 10-6) interactions,~23 SNPs showed additive interactions and~5 SNPs showed only dominance interactions. Those SNPs showing significant associations in the regular logistic regression failed to show significant interactions. In contrast, the SNPs that showed significant interactions failed to show significant associations in models that did not incorporate interactions. It is important to consider interactions of genotype × sex in addition to associations in a GWAA of RA. Furthermore, the association between SNPs and RA susceptibility varies significantly between men and women.

A Multi-SNP Locus-Association Method Reveals a Substantial Fraction of the Missing Heritability

The American Journal of Human Genetics, 2012

There are many known examples of multiple semi-independent associations at individual loci; such associations might arise either because of true allelic heterogeneity or because of imperfect tagging of an unobserved causal variant. This phenomenon is of great importance in monogenic traits but has not yet been systematically investigated and quantified in complex-trait genome-wide association studies (GWASs). Here, we describe a multi-SNP association method that estimates the effect of loci harboring multiple association signals by using GWAS summary statistics. Applying the method to a large anthropometric GWAS meta-analysis (from the Genetic Investigation of Anthropometric Traits consortium study), we show that for height, body mass index (BMI), and waist-to-hip ratio (WHR), 3%, 2%, and 1%, respectively, of additional phenotypic variance can be explained on top of the previously reported 10% (height), 1.5% (BMI), and 1% (WHR). The method also permitted a substantial increase (by up to 50%) in the number of loci that replicate in a discovery-validation design. Specifically, we identified 74 loci at which the multi-SNP, a linear combination of SNPs, explains significantly more variance than does the best individual SNP. A detailed analysis of multi-SNPs shows that most of the additional variability explained is derived from SNPs that are not in linkage disequilibrium with the lead SNP, suggesting a major contribution of allelic heterogeneity to the missing heritability.