Heritability and genomics of gene expression in peripheral blood - PubMed (original) (raw)

doi: 10.1038/ng.2951. Epub 2014 Apr 13.

Patrick F Sullivan 2, Andrew I Brooks 3, Fei Zou 4, Wei Sun 4, Kai Xia 4, Vered Madar 4, Rick Jansen 5, Wonil Chung 4, Yi-Hui Zhou 6, Abdel Abdellaoui 7, Sandra Batista 8, Casey Butler 8, Guanhua Chen 4, Ting-Huei Chen 4, David D'Ambrosio 9, Paul Gallins 10, Min Jin Ha 4, Jouke Jan Hottenga 7, Shunping Huang 8, Mathijs Kattenberg 7, Jaspreet Kochar 9, Christel M Middeldorp 7, Ani Qu 9, Andrey Shabalin 11, Jay Tischfield 3, Laura Todd 10, Jung-Ying Tzeng 6, Gerard van Grootheest 5, Jacqueline M Vink 7, Qi Wang 9, Wei Wang 12, Weibo Wang 8, Gonneke Willemsen 7, Johannes H Smit 5, Eco J de Geus 7, Zhaoyu Yin 4, Brenda W J H Penninx 5, Dorret I Boomsma 7

Affiliations

Heritability and genomics of gene expression in peripheral blood

Fred A Wright et al. Nat Genet. 2014 May.

Abstract

We assessed gene expression profiles in 2,752 twins, using a classic twin design to quantify expression heritability and quantitative trait loci (eQTLs) in peripheral blood. The most highly heritable genes (∼777) were grouped into distinct expression clusters, enriched in gene-poor regions, associated with specific gene function or ontology classes, and strongly associated with disease designation. The design enabled a comparison of twin-based heritability to estimates based on dizygotic identity-by-descent sharing and distant genetic relatedness. Consideration of sampling variation suggests that previous heritability estimates have been upwardly biased. Genotyping of 2,494 twins enabled powerful identification of eQTLs, which we further examined in a replication set of 1,895 unrelated subjects. A large number of non-redundant local eQTLs (6,756) met replication criteria, whereas a relatively small number of distant eQTLs (165) met quality control and replication standards. Our results provide a new resource toward understanding the genetic control of transcription.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

Dr Sullivan was on the SAB of Expression Analysis (Durham, NC, USA). The other authors report no conflicts of interest.

Figures

Figure 1

Figure 1

Transcriptome-wide estimates of heritability, based on _n_=2752 twins. (a) Manhattan plot of _h_2 _P_-values for the highest _h_2 transcript for each of 18,392 genes. The inset (showing _PADI_2) illustrates that the evidence for heritability is based on higher a correlation between MZ pairs (blue) than between DZ pairs (red). (b) Clustering of 777 genes with _h_2 q < 0.05. The most heritable genes belong to the cluster with lowest inter-gene correlation, but many significant genes belong to clusters with high inter-gene correlation. (c) Among 43,628 transcripts, the significant proportion (in terms of false discovery q-value) is dependent on mean transcript expression, increasing rapidly for transcripts above an approximate detection threshold (expression ≥ 3.584, determined as the 90th percentile of chrY expression in females).

Figure 2

Figure 2

Gene density and other predictors of heritability, using _n_=2616 paired co-twins and 18,392 genes. (a) Mean _h_2 (corrected for gene expression level) vs. density of protein coding genes per autosome, showing that heritability is considerably higher for gene-poor chromosomes. Plot symbol area is proportional to number of array genes per chromosome. (b) Histograms of the permuted enrichment z-statistics for two predictors listed in Table 2. Observed values (blue dots) are extreme compared to the permutations.

Figure 3

Figure 3

Apparent heritability and local IBD effects vs. true underlying distributions. (a) For the twin-based _h_2 estimates (_n_=2752, 8818 expressed genes shown), subtracting the effects of sampling variation produces an estimated true distribution (blue). Re-simulating from the fitted true assumed distribution closely approximates the observed _h_2 (black curve). (b) The analogous expressed-gene results for local IBD effect estimation. (c) Proportions of all 18,392 genes exceeding _h_2 thresholds for observed data and for the estimated “true” _h_2 distribution. The MuTHER study (_n_=856) reported many more extreme _h_2 values, but the observation is consistent with greater sampling variation due to smaller sample size. (d) The analogous figure using only expressed genes from both studies.

Figure 4

Figure 4

Comparison and replication of eQTL results. (a) Number of unique genes with evidence of local association (q < 0.01, SNP ± 1 Mb window of gene), depicted for published leukocyte eQTL studies (LCLs, monocytes, and PBLs), as well as subsampling of NTR data (PBLs) using only genotyped markers and moderate QC (_n_=2494, 43,628 transcripts examined). Sample sizes are corrected for the number of covariates used. The “NTR with final QC” value applies q<0.001. (b) Overlap of local eQTL findings with two other large blood studies, at q<0.01. (c) Number of unique genes with evidence (q<0.01) for distant (greater than 1Mb) association. The implausible non-monotone pattern for NTR on original expression values illustrates the importance of robust association methods. Using the final QC on NTR data and q<0.001 drops the number of distant eQTLs from over 800 to ~300. The results suggest that many distant associations remain to be discovered, but careful QC is essential. (d) Overlap of distant eQTL findings (q<0.001) with previous studies (within 1 Mb).

Figure 5

Figure 5

Properties of distant eQTLs. (a) 348 eQTLs (gene-SNP pairs) were significant (q < 0.001) and passed the QC procedures and, of these, 165 replicated (q < 0.1) in 1895 NESDA individuals. (b) The 304 SNPs in significant eQTLs were examined for overlap with regulatory features, including DNase/FAIRE and transfactor binding sites, using Variant Effect Predictor (version 2.8) of Ensembl. Most features were not enriched, although the 3 SNPs annotated as 5′ UTR variants all overlap with regulatory features, representing a significant enrichment compared to the total 18.4% overlap of distant eQTL SNPs with regulatory features representing a significant enrichment compared to the total 18.4% overlap of distant eQTL SNPs with regulatory features. (c) The _π_1 value represents the estimated proportion of the transcriptome influenced by the 304 QC-passing SNPs in significant eQTLs. Across all significant bins the cumulative proportion is only ~3%. (d) A distant eQTL hotspot on chr19 was associated with the expression of 12 distant genes, and one local gene (MYO1F). The partial correlation graph suggests that MYO1F expression is independent of the expression of the other distant genes given the expression of the transcription factor SOX13.

References

    1. Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106:9362–7. - PMC - PubMed
    1. Maurano MT, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;237:1190–1195. - PMC - PubMed
    1. Hardy J. Psychiatric genetics: are we there yet? JAMA psychiatry. 2013 - PubMed
    1. Majewski J, Pastinen T. The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends in genetics: TIG. 2011;27:72–9. - PubMed
    1. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M. Mapping complex disease traits with global gene expression. Nat Rev Genet. 2009;10:184–94. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources