Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs - PubMed (original) (raw)

Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs

Christopher D Brown et al. PLoS Genet. 2013.

Abstract

Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Uniform analysis of multi-cell type eQTL data sets.

Studies are labeled by their acronym from Table 1. (A) Plot of formula image (y-axis) as a function of formula image (x-axis), for each study as a separate line of a diferent color, as indicated in panels B, D, and E. Dashed line represents formula image. (B) Plot of formula image eQTL counts as function of formula image, for all studies. (C) eQTL count (x-axis) by tier, for tiers 1–4 (light blue, dark blue, light green, and dark green, respectively), with separate bars for each study (y-axis). (D) Fraction of genes with a significant eQTL SNP (y-axis; thresholded at formula image), as function of sample size (x-axis). Each study is plotted in a distinct color, as indicated with labels. Studies with replicate expression measurements are depicted as triangles, studies without as circles. (E) Fraction of genes with a significant eQTL that have more than one independently associated SNP (y-axis; thresholded at formula image), as a function of sample size (x-axis). Each study is plotted in a distinct color. Studies with replicate expression measurements are depicted as triangles, studies without as circles. (F) Histogram of eQTL counts by tier (y-axis; colors as in panel C), summed across studies, as a function of their distance to the gene transcription start and end sites (x-axis; gene split into 10 bins). P (grey) line depicts the counts of first tier eQTL SNPs from a permutation, to illustrate the background distribution of tested SNPs.

Figure 2

Figure 2. Cell type specific eQTL replication frequencies.

(A, B, C) eQTL replication frequency (y-axis) as a function of discovery significance (x-axis: formula image). SNPs were grouped into 30 equally spaced bins by BF. (D, E, F) eQTL replication frequency (y-axis; thresholded at formula image) as a function of SNP position (formula image) (x-axis). Cis-eQTL SNPs within 250 kb of the TSS were grouped into formula image equally spaced bins. (A, D) Replication frequencies for CAP_LCL eQTLs in Stranger_LCLs (blue) and Merck_liver (red). (B, E) Replication frequencies for UChicago_liver eQTLs in Merck_liver (blue) and Stranger_LCL (red). (C, F) Replication frequencies for Myers_brain eQTLs in Harvard_cerebellum (blue) and Stranger_LCL (red). In all panels, bold lines depict percentage of SNP-gene pairs with formula image per bin, and ribbons depict formula image confidence interval.

Figure 3

Figure 3. eQTL SNPs are enriched within activating cis-regulatory elements.

(A–I) CAP_LCL eQTL SNP (formula image) overlap with predicted cis-regulatory elements. Each row of panels depicts overlap with distinct CRE data sets: (A–C) DNAse hypersensitive sites, (D–F) p300 binding sites, (G–I) chromHMM predicted active promoters. In each panel, SNPs are grouped into 25 equally spaced bins within the 50 kb upstream and downstream of the TSS and TES, and 10 bins between the TSS and TES. Each bin is plotted along the x-axis. Bold lines depict the percentage, per bin, of SNPs overlapping the CRE class, ribbons depict formula image confidence interval. Each column of panels depicts a distinct SNP set contrast. (A,D,G) Observed eQTL SNPs (blue) and randomly drawn cis-linked SNPs at expressed genes (red). (B,E,H) eQTL SNPs that replicate in Stranger_LCL (formula image) (green) and SNPs that fail to replicate (purple). (C,F,I) CAP_LCL eQTL SNP overlap with CREs derived from the LCL line GM12878 (orange) and HepG2 cells (brown).

Figure 4

Figure 4. eQTL SNPs are depleted within repressive chromatin contexts.

(A–I) CAP_LCL eQTL SNP (formula image) overlap with predicted cis-regulatory elements. (A–C) eQTL SNP overlap with chromHMM predicted heterochromatin, (D–F) eQTL SNP overlap with chromHMM predicted repressive chromatin, (G–I) eQTL SNP-TSS pairs with an intervening CTCF binding site. In each panel, SNPs are grouped into 25 equally spaced bins within the 50 kb upstream and downstream of the TSS and TES, and 10 bins between the TSS and TES. Each bin is plotted along the x-axis. Bold lines depict bin percentage, ribbons depict formula image confidence interval. Each column of panels depicts a distinct SNP set contrast. (A,D,G) Observed eQTL SNPs (blue) and randomly drawn cis-linked SNPs at expressed genes(red). (B,E,H) eQTL SNPs that replicate in Stranger_LCL (formula image) (green) and SNPs that fail to replicate (purple). (C,F,I) CAP_LCL eQTL SNP overlap with CREs derived from the LCL line GM12878 (orange) and HepG2 cells (brown).

Figure 5

Figure 5. Cell specificity of eQTL SNP-CRE overlap illustrated with DNAse hypersensitivity data.

Percentage (dots) and formula image confidence interval (lines) of (A) CAP_LCL, (B) UChicago_liver, and (C) Harvard_cerebellum eQTL SNPs overlapping DHS sites (y-axis) derived from the LCL cell line GM12878 (red), the HepG2 cell line (blue), and the cerebellum (green).

Figure 6

Figure 6. SORT1 eQTL illustrates mechanisms underlying cell specificity of eQTLs.

Associations between (A) UChicago_liver and (B) CAP_LCL SORT1 expression and cis-linked SNPs (left y-axis; formula image), plotted as points by SNP genomic coordinates (x-axis). Blue line overlaying the manhattan plot is the estimate of the local recombination rate (right y-axis; cM/Mb). Points are colored by level of LD (see legend below) with the reference SNP (purple diamond). Below each manhattan plot are boxes depicting the location of chromHMM predicted promoters (red), enhancers (orange), and insulators (blue). Below CRE predictions are RefSeq gene models.

Figure 7

Figure 7. Data integration predicts cell type specificity of eQTLs.

ROC curves depicting the performance of a random forest classifier to predict within cell type reproducibility (red), between cell type reproducibility (blue), and within cell type specific reproducibility (green). Predictions plotted separately for (A) LCL/LCL/Liver, (B)Liver/Liver/LCL, and (C) Brain/Brain/LCL. The classifier was trained on a diverse collection of CREs (see Methods and Supplement for complete data set description). True positive rates (y-axis) and false positive rates (x-axis) were quantified by ten fold cross validation.

Similar articles

Cited by

References

    1. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nature Reviews Genetics 10: 184–94. - PMC - PubMed
    1. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: 423–8. - PubMed
    1. Gilad Y, Rifkin SA, Pritchard JK (2008) Revealing the architecture of gene regulation: the promise of eQTL studies. Trends in genetics 24: 408–415. - PMC - PubMed
    1. Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296: 752–5. - PubMed
    1. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297–302. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources