Orienting the causal relationship between imprecisely measured traits using GWAS summary data - PubMed (original) (raw)

Orienting the causal relationship between imprecisely measured traits using GWAS summary data

Gibran Hemani et al. PLoS Genet. 2017.

Erratum in

Abstract

Inference about the causal structure that induces correlations between two traits can be achieved by combining genetic associations with a mediation-based approach, as is done in the causal inference test (CIT). However, we show that measurement error in the phenotypes can lead to the CIT inferring the wrong causal direction, and that increasing sample sizes has the adverse effect of increasing confidence in the wrong answer. This problem is likely to be general to other mediation-based approaches. Here we introduce an extension to Mendelian randomisation, a method that uses genetic associations in an instrumentation framework, that enables inference of the causal direction between traits, with some advantages. First, it can be performed using only summary level data from genome-wide association studies; second, it is less susceptible to bias in the presence of measurement error or unmeasured confounding. We apply the method to infer the causal direction between DNA methylation and gene expression levels. Our results demonstrate that, in general, DNA methylation is more likely to be the causal factor, but this result is highly susceptible to bias induced by systematic differences in measurement error between the platforms, and by horizontal pleiotropy. We emphasise that, where possible, implementing MR and appropriate sensitivity analyses alongside other approaches such as CIT is important to triangulate reliable conclusions about causality.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1

Fig 1. Gene expression levels (blue blocks) and DNA methylation levels (green triangles) may be correlated but the causal structure is unknown.

If a SNP (yellow circle) is associated with both DNA methylation and gene expression levels then it can be used as an instrument, but there are three basic competing models for these variables. The causal inference test (CIT) attempts to distinguish between them. a) Gene expression causes methylation. The left figure shows that the SNP influences gene expression levels that in turn influence methylation levels. The right figure shows the directed acyclic graph that represents this model. Faded symbols represent the measured values whereas solid symbols represent the true values. b) The same as in A, except the causal direction is from DNA methylation to Gene expression. c) A model of confounding, where gene expression and DNA methylation are not causally related, but the SNP influences them each through separate pathways or a confounder.

Fig 2

Fig 2. The CIT was performed on simulated variables where the exposure influenced the outcome and the exposure was instrumented by a SNP.

The test statistic from CIT when testing if the exposure caused the outcome (the true model) is in red, and the test for the outcome causing the exposure (false model) is in green. Rows of plots represent the sample sizes used for the simulations. As measurement imprecision increases (decreasing values on x-axis) the test statistic for the incorrect model gets stronger and the test statistic for the correct model gets weaker.

Fig 3

Fig 3. Outcomes were simulated to be unrelated to the exposure (bottom plot, showing false positive rates on the y-axis) or causally influenced by the exposure (top plot, showing true positive rates on the y-axis) with varying degrees of measurement imprecision applied to the exposure variable (x axis).

Results for MR and CIT were compared for varying sample sizes (columns of boxes).

Fig 4

Fig 4

a) We can predict the values the MR Steiger test would take (z-axis) for different potential values of measurement error (x and y axes), drawn here as the blue surface. When ρ g,y > ρ g,x, as denoted by the range of values where the blue surface is above the black plane, those values of measurement error lead to our observed MR Steiger test inferring the wrong causal direction. Where the blue surface lies below the black plane, these measurement error values support the inferred causal direction of X to Y. A measure of reliability, therefore, is the ratio of the negative and positive volumes of the total space bound by the blue and black surfaces, R=Vz≥0-Vz<0. In this case, where ρg,x2=0.01 and ρx,y2=0.1, the R = 4.40, which means that 4.40 times as much of the possible measurement error values are in support of the xy direction of causality than yx. b) Plots depicting the parameter space in which the function d = cor(x, x O) − cor(x, y)cor(y, y O) is negative. When d is negative the MR Steiger test is liable to infer the wrong direction of causality. Shaded regions show the parameter space where d is negative. The graph shows that for the majority of the parameter space of the function, d is positive, especially where causal relationships are relatively weak.

Fig 5

Fig 5

a) Outcome y was simulated to be caused by exposure x as shown in the graph, with varying degrees of measurement error applied to both. CIT and MR were used to infer evidence for causality between the exposure and outcome, and to infer the direction of causality. The columns of graphs denote intervals for he value of d = ρ x,x oρ x,y ρ y,y o, such that when d is negative we expect the MR Steiger test to be more likely to be wrong about the direction of causality. Rows of graphs represent the sample size used in the simulations. For the CIT method, outcome 1 denoted evidence for causality with correct model, outcomes 2 or 3 denoted evidence for causality with incorrect model, and outcome 4 denoted no evidence for causality. b) As in (a) except the simulated model was non-causal, and a genetic confounder induces an association between x and y. Neither CIT nor MR are able to identify this model, so any significant associations in MR are deemed to be incorrect, while outcomes 1 or 2 for the CIT are deemed to be incorrect.

Fig 6

Fig 6. Using 458 putative associations between DNA methylation and gene expression we used the MR Steiger test to infer the direction of causality between them.

a) The rightmost bar shows the proportion of associations for each of the two possible causal directions (colour key) assuming no measurement error in either gene expression or DNA methylation levels. The proportions change when we assume different levels of measurement error in gene expression levels (x-axis) or DNA methylation levels (columns of boxes). If there is systematically higher measurement error in one platform than the other it will appear to be less likely to be the causal factor. b) The relationship between the Pearson correlation between DNA methylation and gene expression levels (x-axis) and the causal estimate (scaled to be in standard deviation units, y-axis). c) Distribution of estimated causal effect sizes, stratified into associations inferred to be due to DNA methylation causing expression (blue) and expression causing DNA methylation (red).

References

    1. Phillips AN, Davey Smith G. How independent are “independent” effects? relative risk estimation when correlated exposures are measured imprecisely. Journal of Clinical Epidemiology. Pergamon; 1991;44: 1223–1231. doi: 10.1016/0895-4356(91)90155-3 - DOI - PubMed
    1. Davey Smith G, Ebrahim S. Data dredging, bias, or confounding. BMJ. 2002;325: 1437–8. doi: 10.1136/bmj.325.7378.1437 - DOI - PMC - PubMed
    1. Davey Smith G, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations. International journal of epidemiology. 2004;33: 30–42. doi: 10.1093/ije/dyh132 - DOI - PubMed
    1. Millstein J, Zhang B, Zhu J, Schadt EE. Disentangling molecular relationships with a causal inference test. BMC genetics. 2009;10: 23 doi: 10.1186/1471-2156-10-23 - DOI - PMC - PubMed
    1. Aten JE, Fuller TF, Lusis AJ, Horvath S. Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC systems biology. 2008;2: 34 doi: 10.1186/1752-0509-2-34 - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources