Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations - PubMed (original) (raw)

Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations

Alexandra C Nica et al. PLoS Genet. 2010.

Abstract

The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r(2)) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Excess of regulatory variants among GWAS signals.

QQ plot depicting the excess of significant regulatory signal in GWAS data (976 NHGRI SNPs). For both the cis and trans analyses, the −log10(P-value) of the best associations per SNP are plotted. In red, the distribution of these values for GWAS SNPs is compared to that of the median of 1,000 sets of 976 random SNPs with same MAF distribution. In black, the estimated upper limit of the 95% confidence interval is plotted.

Figure 2

Figure 2. Cis regulatory enrichment stratified by immunity relatedness.

The −log10(P-value) of the best associations per GWAS SNPs and a set of random SNPs are plotted. As expected given the tissue (LCLs), immunity related phenotypes are mainly responsible for the enrichment.

Figure 3

Figure 3. RTC score distribution.

The RTC score is uniformly distributed for simulated eQTLs and dSNPs tagging two different causal variants in the same interval (left panel). The RTC Score is right-skewed for simulated eQTLs and dSNPs tagging the same functional variant (middle panel). The RTC score is sensitive to associations tagging a common functional variant in non-simulated data, when the GWAS trait is gene expression (GenCord LCL samples – right panel).

Figure 4

Figure 4. Properties of the RTC score when varying r2.

Simulation results depicting the relationship between the RTC score and the r2 (eQTL, dSNP) when they tag different causal SNPs (H0: left panel) versus one causal SNP (H1: right panel). The RTC increases as expected with increased r2 between the eQTL and the dSNP, but when tagging the same functional variant, various lower pairwise r2 combinations can determine a high RTC. This makes r2 on its own insufficient to detect shared causal effects.

Figure 5

Figure 5. RTC score properties when varying D'.

Simulation results depicting the relationship between the RTC score and the D' (eQTL, dSNP) when they tag different causal SNPs (H0: left panel) versus one causal SNP (H1: right panel). D' is not correlated with RTC, therefore it will not determine high scores on its own in the absence of a common functional variant. Under the H1, the majority of high RTC scoring pairs have high D', but in the case of a perfect historical correlation scenario, it's impossible to distinguish causal from coincidental effects with D' only.

Figure 6

Figure 6. RTC score properties when varying the median r2 of the hotspot interval.

Simulation results depicting the relationship between the RTC score and the local LD structure (median r2) under the null (different causal SNPs - left panel) and alternative hypothesis (same causal SNP - right panel). Under H0, the RTC score is evenly distributed, therefore intervals with overall low LD will not determine high RTC scores. Under H1, the RTC performs best in intervals with overall low LD, where the correlation between the eQTL and other non-disease SNPs decays much faster, making the dSNP correction stand out.

Figure 7

Figure 7. Overrepresentation of immunity-related high-scoring cis signals.

Distribution of best RTC Scores per GWAS SNP stratified by immunity relatedness. Histogram contains results from the analysis of 130 hotspot intervals with colocalizing disease SNPs and cis eQTLs. We observe a significant overrepresentation of high-scoring (RTC > = 0.9) candidate genes (black bars) for immunity related complex traits compared to non-immunity related ones (grey bars) (Fisher's Exact Test, P-value = 0.0125).

Figure 8

Figure 8. The RTC method compared to standard LD measurements in the observed data.

Neither r2 nor D' between the eQTL and the GWAS SNP are direct predictors of a high RTC Score. Highlighted here are the results from the cis and trans analyses. We obtain high scoring results (RTC Scores > = 0.8 in blue) for cases with a high correlation between the disease SNP and the eQTL as expected, but also for pairs with low statistical correlation (r2 – top panel). As shown in the bottom panel, many of these high scoring pairs are historically correlated (D' = 1), but so are many more by chance. Additionally, we can detect high scoring pairs with low D' as well. Hence, no obvious combination of the two LD measures can predict a high RTC Score.

Figure 9

Figure 9. The fraction of eQTL variance explained away by the dSNP versus the RTC score.

We contrast the LR adjusted R2 at the eQTL after and before correction of the dSNP and observe that while most high scoring pairs correspond to cases of lowest variance left unexplained, other interesting cases would be missed solely by using an arbitrary variance threshold.

Similar articles

Cited by

References

    1. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
    1. Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40:638–645. - PMC - PubMed
    1. Loos RJ, Lindgren CM, Li S, Wheeler E, Zhao JH, et al. Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat Genet. 2008;40:768–775. - PMC - PubMed
    1. Eeles RA, Kote-Jarai Z, Giles GG, Olama AA, Guy M, et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat Genet. 2008;40:316–321. - PubMed
    1. Pollin TI, Damcott CM, Shen H, Ott SH, Shelton J, et al. A null mutation in human APOC3 confers a favorable plasma lipid profile and apparent cardioprotection. Science. 2008;322:1702–1705. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources