Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs - PubMed (original) (raw)
Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs
Christopher D Brown et al. PLoS Genet. 2013.
Abstract
Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Figure 1. Uniform analysis of multi-cell type eQTL data sets.
Studies are labeled by their acronym from Table 1. (A) Plot of (y-axis) as a function of
(x-axis), for each study as a separate line of a diferent color, as indicated in panels B, D, and E. Dashed line represents
. (B) Plot of
eQTL counts as function of
, for all studies. (C) eQTL count (x-axis) by tier, for tiers 1–4 (light blue, dark blue, light green, and dark green, respectively), with separate bars for each study (y-axis). (D) Fraction of genes with a significant eQTL SNP (y-axis; thresholded at
), as function of sample size (x-axis). Each study is plotted in a distinct color, as indicated with labels. Studies with replicate expression measurements are depicted as triangles, studies without as circles. (E) Fraction of genes with a significant eQTL that have more than one independently associated SNP (y-axis; thresholded at
), as a function of sample size (x-axis). Each study is plotted in a distinct color. Studies with replicate expression measurements are depicted as triangles, studies without as circles. (F) Histogram of eQTL counts by tier (y-axis; colors as in panel C), summed across studies, as a function of their distance to the gene transcription start and end sites (x-axis; gene split into 10 bins). P (grey) line depicts the counts of first tier eQTL SNPs from a permutation, to illustrate the background distribution of tested SNPs.
Figure 2. Cell type specific eQTL replication frequencies.
(A, B, C) eQTL replication frequency (y-axis) as a function of discovery significance (x-axis: ). SNPs were grouped into 30 equally spaced bins by BF. (D, E, F) eQTL replication frequency (y-axis; thresholded at
) as a function of SNP position (
) (x-axis). Cis-eQTL SNPs within 250 kb of the TSS were grouped into
equally spaced bins. (A, D) Replication frequencies for CAP_LCL eQTLs in Stranger_LCLs (blue) and Merck_liver (red). (B, E) Replication frequencies for UChicago_liver eQTLs in Merck_liver (blue) and Stranger_LCL (red). (C, F) Replication frequencies for Myers_brain eQTLs in Harvard_cerebellum (blue) and Stranger_LCL (red). In all panels, bold lines depict percentage of SNP-gene pairs with
per bin, and ribbons depict
confidence interval.
Figure 3. eQTL SNPs are enriched within activating cis-regulatory elements.
(A–I) CAP_LCL eQTL SNP () overlap with predicted cis-regulatory elements. Each row of panels depicts overlap with distinct CRE data sets: (A–C) DNAse hypersensitive sites, (D–F) p300 binding sites, (G–I) chromHMM predicted active promoters. In each panel, SNPs are grouped into 25 equally spaced bins within the 50 kb upstream and downstream of the TSS and TES, and 10 bins between the TSS and TES. Each bin is plotted along the x-axis. Bold lines depict the percentage, per bin, of SNPs overlapping the CRE class, ribbons depict
confidence interval. Each column of panels depicts a distinct SNP set contrast. (A,D,G) Observed eQTL SNPs (blue) and randomly drawn cis-linked SNPs at expressed genes (red). (B,E,H) eQTL SNPs that replicate in Stranger_LCL (
) (green) and SNPs that fail to replicate (purple). (C,F,I) CAP_LCL eQTL SNP overlap with CREs derived from the LCL line GM12878 (orange) and HepG2 cells (brown).
Figure 4. eQTL SNPs are depleted within repressive chromatin contexts.
(A–I) CAP_LCL eQTL SNP () overlap with predicted cis-regulatory elements. (A–C) eQTL SNP overlap with chromHMM predicted heterochromatin, (D–F) eQTL SNP overlap with chromHMM predicted repressive chromatin, (G–I) eQTL SNP-TSS pairs with an intervening CTCF binding site. In each panel, SNPs are grouped into 25 equally spaced bins within the 50 kb upstream and downstream of the TSS and TES, and 10 bins between the TSS and TES. Each bin is plotted along the x-axis. Bold lines depict bin percentage, ribbons depict
confidence interval. Each column of panels depicts a distinct SNP set contrast. (A,D,G) Observed eQTL SNPs (blue) and randomly drawn cis-linked SNPs at expressed genes(red). (B,E,H) eQTL SNPs that replicate in Stranger_LCL (
) (green) and SNPs that fail to replicate (purple). (C,F,I) CAP_LCL eQTL SNP overlap with CREs derived from the LCL line GM12878 (orange) and HepG2 cells (brown).
Figure 5. Cell specificity of eQTL SNP-CRE overlap illustrated with DNAse hypersensitivity data.
Percentage (dots) and confidence interval (lines) of (A) CAP_LCL, (B) UChicago_liver, and (C) Harvard_cerebellum eQTL SNPs overlapping DHS sites (y-axis) derived from the LCL cell line GM12878 (red), the HepG2 cell line (blue), and the cerebellum (green).
Figure 6. SORT1 eQTL illustrates mechanisms underlying cell specificity of eQTLs.
Associations between (A) UChicago_liver and (B) CAP_LCL SORT1 expression and cis-linked SNPs (left y-axis; ), plotted as points by SNP genomic coordinates (x-axis). Blue line overlaying the manhattan plot is the estimate of the local recombination rate (right y-axis; cM/Mb). Points are colored by level of LD (see legend below) with the reference SNP (purple diamond). Below each manhattan plot are boxes depicting the location of chromHMM predicted promoters (red), enhancers (orange), and insulators (blue). Below CRE predictions are RefSeq gene models.
Figure 7. Data integration predicts cell type specificity of eQTLs.
ROC curves depicting the performance of a random forest classifier to predict within cell type reproducibility (red), between cell type reproducibility (blue), and within cell type specific reproducibility (green). Predictions plotted separately for (A) LCL/LCL/Liver, (B)Liver/Liver/LCL, and (C) Brain/Brain/LCL. The classifier was trained on a diverse collection of CREs (see Methods and Supplement for complete data set description). True positive rates (y-axis) and false positive rates (x-axis) were quantified by ten fold cross validation.
Similar articles
- Principles of microRNA Regulation Revealed Through Modeling microRNA Expression Quantitative Trait Loci.
Budach S, Heinig M, Marsico A. Budach S, et al. Genetics. 2016 Aug;203(4):1629-40. doi: 10.1534/genetics.116.187153. Epub 2016 Jun 3. Genetics. 2016. PMID: 27260304 Free PMC article. - Illuminating links between cis-regulators and trans-acting variants in the human prefrontal cortex.
Liu S, Won H, Clarke D, Matoba N, Khullar S, Mu Y, Wang D, Gerstein M. Liu S, et al. Genome Med. 2022 Nov 24;14(1):133. doi: 10.1186/s13073-022-01133-8. Genome Med. 2022. PMID: 36424644 Free PMC article. - SNP eQTL status and eQTL density in the adjacent region of the SNP are associated with its statistical significance in GWA studies.
Gorlov I, Xiao X, Mayes M, Gorlova O, Amos C. Gorlov I, et al. BMC Genet. 2019 Nov 12;20(1):85. doi: 10.1186/s12863-019-0786-0. BMC Genet. 2019. PMID: 31718536 Free PMC article. - Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation.
Croteau-Chonka DC, Rogers AJ, Raj T, McGeachie MJ, Qiu W, Ziniti JP, Stubbs BJ, Liang L, Martinez FD, Strunk RC, Lemanske RF Jr, Liu AH, Stranger BE, Carey VJ, Raby BA. Croteau-Chonka DC, et al. PLoS One. 2015 Oct 16;10(10):e0140758. doi: 10.1371/journal.pone.0140758. eCollection 2015. PLoS One. 2015. PMID: 26474488 Free PMC article. Review. - Genome-wide expression quantitative trait loci analysis in asthma.
Bossé Y. Bossé Y. Curr Opin Allergy Clin Immunol. 2013 Oct;13(5):487-94. doi: 10.1097/ACI.0b013e328364e951. Curr Opin Allergy Clin Immunol. 2013. PMID: 23945176 Review.
Cited by
- Neurobiological substrates underlying the effect of genomic risk for depression on the conversion of amnestic mild cognitive impairment.
Xu J, Li Q, Qin W, Jun Li M, Zhuo C, Liu H, Liu F, Wang J, Schumann G, Yu C. Xu J, et al. Brain. 2018 Dec 1;141(12):3457-3471. doi: 10.1093/brain/awy277. Brain. 2018. PMID: 30445590 Free PMC article. - Cis-regulatory variants affect CHRNA5 mRNA expression in populations of African and European ancestry.
Wang JC, Spiegel N, Bertelsen S, Le N, McKenna N, Budde JP, Harari O, Kapoor M, Brooks A, Hancock D, Tischfield J, Foroud T, Bierut LJ, Steinbach JH, Edenberg HJ, Traynor BJ, Goate AM. Wang JC, et al. PLoS One. 2013 Nov 26;8(11):e80204. doi: 10.1371/journal.pone.0080204. eCollection 2013. PLoS One. 2013. PMID: 24303001 Free PMC article. - Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.
Georgi B, Craig D, Kember RL, Liu W, Lindquist I, Nasser S, Brown C, Egeland JA, Paul SM, Bućan M. Georgi B, et al. PLoS Genet. 2014 Mar 13;10(3):e1004229. doi: 10.1371/journal.pgen.1004229. eCollection 2014 Mar. PLoS Genet. 2014. PMID: 24625924 Free PMC article. - Promoter shape varies across populations and affects promoter evolution and expression noise.
Schor IE, Degner JF, Harnett D, Cannavò E, Casale FP, Shim H, Garfield DA, Birney E, Stephens M, Stegle O, Furlong EE. Schor IE, et al. Nat Genet. 2017 Apr;49(4):550-558. doi: 10.1038/ng.3791. Epub 2017 Feb 13. Nat Genet. 2017. PMID: 28191888 - Single-cell network biology for resolving cellular heterogeneity in human diseases.
Cha J, Lee I. Cha J, et al. Exp Mol Med. 2020 Nov;52(11):1798-1808. doi: 10.1038/s12276-020-00528-0. Epub 2020 Nov 26. Exp Mol Med. 2020. PMID: 33244151 Free PMC article. Review.
References
- Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: 423–8. - PubMed
- Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296: 752–5. - PubMed
- Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297–302. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- K99 HG006265/HG/NHGRI NIH HHS/United States
- HG006265/HG/NHGRI NIH HHS/United States
- U01 HL69757/HL/NHLBI NIH HHS/United States
- R00 HG006265/HG/NHGRI NIH HHS/United States
- U01 HL069757/HL/NHLBI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases