Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index - PubMed (original) (raw)

. 2015 Oct;47(10):1114-20.

doi: 10.1038/ng.3390. Epub 2015 Aug 31.

Andrew Bakshi 1, Zhihong Zhu 1, Gibran Hemani 1 3, Anna A E Vinkhuyzen 1, Sang Hong Lee 1 4, Matthew R Robinson 1, John R B Perry 5, Ilja M Nolte 6, Jana V van Vliet-Ostaptchouk 6 7, Harold Snieder 6; LifeLines Cohort Study; Tonu Esko 8 9 10 11, Lili Milani 8, Reedik Mägi 8, Andres Metspalu 8 12, Anders Hamsten 13, Patrik K E Magnusson 14, Nancy L Pedersen 14, Erik Ingelsson 15 16, Nicole Soranzo 17 18, Matthew C Keller 19 20, Naomi R Wray 1, Michael E Goddard 21 22, Peter M Visscher 1 2

Affiliations

Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index

Jian Yang et al. Nat Genet. 2015 Oct.

Abstract

We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ∼17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60-70% for height and 30-40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

The authors declare no competing financial interests.

Figures

Figure 1

Figure 1

Estimates of heritability using sequence variants under different simulation scenarios based on the UK10K-WGS data. Each column represents the mean estimate from 200 simulations. Each error bar is the s.e. of the mean. The true heritability parameter is 0.8 for the simulated trait (see Online Methods for the 4 simulation scenarios).

Figure 2

Figure 2

Fitting region-specific LD heterogeneity of the genome using a sliding-window approach. Shown are the results for chromosome 22 from the UK10K-WGS data as example. LD score of each variant is defined as the sum of LD _r_2 between the target variant and all variants (including the target variant) within ±10Mb distance. For the GREML-LDMS analysis, the region-specific LD heterogeneity is fitted by segments with average length of 100Kb (the dots in blue) using a sliding window approach as described in Online Methods.

Figure 3

Figure 3

Proportion of variation at sequence variants captured by 1KGP imputation in the UK10K-WGS data. The results are the averages from 200 simulations (Online Methods). Panel (a): estimates of proportion of phenotypic variance explained by 1KGP-imputed variants in different MAF groups from GREML-MS. The 1KGP imputation was based on variants on Illumina CoreExome array extracted from the UK10K-WGS data. The column in purple represents the variance explained by the causal variants. The other four columns represent the estimates using 1KGP-imputed variants filtered at 3 levels of imputation accuracy (IMPUTE-INFO) threshold. The error bar is the s.e.m.. Without filtering variants for IMPUTE-INFO (columns in yellow), the sum of the estimate is 96.2% for common variants and 73.4% for rare variants. Panel (b): estimates of proportion of variation at sequence variants captured by 1KGP imputation (the estimate of phenotypic variance explained by the 1KGP-imputed variants summed over MAF groups divided by that explained by the causal variants) based on different types of SNP genotyping arrays. Common: MAF > 0.01; Rare: 0.01 ≥ MAF > 0.0003.

Figure 4

Figure 4

Evidence for height- and BMI-associated genetic variants being under natural selection. Results shown in panels (a) and (b) are from the GREML-LDMS analyses (Online Methods). Panel (a): the estimate of cumulative contribution of variants with MAF ≤ θ to the genetic variance, i.e. σ^v2(MAF≤θ)/σ^v2(MAF≤0.5). The dash line represents that expected under a neutral model. Panel (b): the estimate of h1KGP2 for variants in each MAF group. Error bar is s.e. of the estimate. Results shown on panel (c) are from genome-wide association analyses in the combined data set (Online Methods). bm is defined as the effect size of the minor allele of a variant. Variants are stratified into 100 MAF bins (100 quartiles of the MAF distribution). Plotted is the mean of b̂m against log10(mean MAF) in each bin. The correlation between mean b̂m and log10(mean MAF) is 0.77 (P < 1.0×10−6) for height and −0.39 (_P_ = 8.0×10−6) for BMI. Shown on panel (d) are the results from the latest GIANT consortium meta-analyses for height and BMI (see **Web Resources**) for common SNPs (MAF > 0.01). There are ~2.5M SNPs stratified into 20 MAF bins. The correlation between mean b̂m and log10(mean MAF) is 0.89 (_P_permu < 1.0×10−6) for height and −0.87 (_P_permu < 1.0×10−6) for BMI. The mean b̂m seems smaller in panel (c) than that in panel (d) because of the smaller MAF range of each bin and larger number of variants in each bin in panel (c) than those in panel (d).

Figure 5

Figure 5

Single-variant tagging of sequence variants by 1KGP-imputed variants. Single-variant tagging is defined as the squared correlation (_r_2max) between a sequence variant and the best tagging variant from 1KGP imputation within ±1Mb distance. Shown are the average _r_2max of variants in MAF bins for 10,000 sequence variants randomly sampled from the UK10K-WGS data. The 1KGP imputation analyses are based on variants on Illumina OmniExpress (red) and Illumina CoreExome (blue) arrays extracted from the UK10K-WGS data (see Online Methods for details about the imputation analyses based on the UK10K-WGS data). Panel (a): rare variants. Panel (b): common variants.

Similar articles

Cited by

References

    1. Welter D, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–6. - PMC - PubMed
    1. Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed
    1. Yang J, et al. Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. PLoS Genet. 2013;9:e1003355. - PMC - PubMed
    1. Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88:294–305. - PMC - PubMed
    1. Wood AR, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014;46:1173–86. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources