Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models - PubMed (original) (raw)

. 2013 Jan;14(Jan):111-152.

Affiliations

Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models

Aapo Hyvärinen et al. J Mach Learn Res. 2013 Jan.

Abstract

We present new measures of the causal direction, or direction of effect, between two non-Gaussian random variables. They are based on the likelihood ratio under the linear non-Gaussian acyclic model (LiNGAM). We also develop simple first-order approximations of the likelihood ratio and analyze them based on related cumulant-based measures, which can be shown to find the correct causal directions. We show how to apply these measures to estimate LiNGAM for more than two variables, and even in the case of more variables than observations. We further extend the method to cyclic and nonlinear models. The proposed framework is statistically at least as good as existing ones in the cases of few data points or noisy data, and it is computationally and conceptually very simple. Results on simulated fMRI data indicate that the method may be useful in neuroimaging where the number of time points is typically quite small.

Keywords: Bayesian network; causality; independent component analysis; non-Gaussianity; structural equation model.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Intuitive illustration of the nonlinear correlations. Here, xy and the variables are very sparse. The nonlinear correlation E{x_3_y} is larger than E{_xy_3} because when both variables are simultaneously large (the “arm” of the distribution on the right and the left), x attains larger values than y due to regression towards the mean.

Figure 2

Figure 2

The pdf for robust modelling of skewed densities. Left: the pdf corresponding to the derivative of log-pdf in (16) is plotted (solid curve) with α and β chosen so that the density is standardized. For comparison, the Gaussian density of the same mean and variance is plotted as well (dashed). Right: the logarithms of the same density functions.

Figure 3

Figure 3

Simulation 1. Results of basic simulation with sparse, non-skewed data without noise. Top left: Mean of rank-correlation coefficients between the estimated causal ordering and the true ordering. The error bars are standard errors of the mean. Top right: The proportion of (really existing) connections for which the method estimated the direction correctly (chance level is 50%). Bottom left: The proportion of data sets for which the method estimated the first variable in the causal ordering correctly, that is, the variable with no parents. Bottom right: Computation times of one run of the different algorithms in milliseconds; note the logarithmic scale. Different colours are different data-generating scenarios. The algorithms used are as follows: “tanh”: LR approximations in (18) based on tanh nonlinearity, combined with deflation in DirectLiNGAM; “nodf”: no deflation in likelihood ratio approximations, that is, ordering based on the LR approximation matrix in (18) without any recomputation of the matrix; “mxnt”: maximum entropy approximation in (3) for likelihood ratios; “ICA”: LiNGAM estimated by ICA; “kdir”: kernel-based DirectLiNGAM.

Figure 4

Figure 4

Simulation 2, with noise. Legend as in Figure 3, and with T = 10,000. The noise standard deviations were all equal to one.

Figure 5

Figure 5

Simulation 3, with skewed data. Legend as in Figure 3, with the following new algorithms: “skew”: cumulant-based LR approximation in (10), combined with deflation in DirectLiNGAM; “skw2”: the robust LR approximation proposed in Section 2.9.2; and “D-R”: the measure by Dodge and Rousson (2001).

Figure 6

Figure 6

Simulation 4, with skewed data with noise. Legend as in Figure 5.

Figure 7

Figure 7

Simulation 5, with the two-stage pruning method and only sparse graphs. Legend as in Figure 3, but now including the new algorithm “icth” which prunes the graph based on inverse covariance and then estimates the directions using the same method as “tanh”. (Note that only “icth” uses information on the pruned inverse covariance, other methods are as in Simulation 1.)

Figure 8

Figure 8

Simulation 6, with skewed data, the two-stage pruning method and only sparse graphs. Legend as in Figure 5, but now including the new algorithm “icsk” which prunes the graph based on inverse covariance and estimates the directions based on the skewness cumulant, and “ics2” which uses the robust skewness measure.

Figure 9

Figure 9

Overview of Simulations 1–6. Median correlations (blue, solid) and average directions correct (green, dashed) are plotted averaged over different scenarios and similar simulations.

Figure 10

Figure 10

Simulation 7, with more variables than observations. Legend as in Figure 3. Rank correlations and causal directions correct are omitted because we only computed the first two variables for lack of computation time.

Figure 11

Figure 11

Simulation 8, with cyclic sparse graphs. Legend (sample sizes and dimensions) as in Figure 7.

Figure 12

Figure 12

Simulation 9, with nonlinear model. The new algorithms are “nlme”, the proposed likelihood ratio method extended to the nonlinear case using maximum entropy approximation in (22); “mad”, a simplified and robustified approximation of the likelihood ratio in (24); “hsic”, the original nonlinear method using independence (Hoyer et al., 2009). Blue: γ = 0.5, T = 200, Green: γ = 2, T = 200, Red: γ = 0.5, T = 500, Cyan: γ = 2, T = 500.

Figure 13

Figure 13

Simulation 10. Like Simulation 1 but with Laplacian disturbances used in generating the data, and the “skew” method added.

Figure 14

Figure 14

Simulation 11. Like Simulation 1, with n = 4, T = 500, but with a latent variable added. The four scenarios (curves) correspond to different strengths of the latent variable, starting with zero strength in blue curve.

Figure 15

Figure 15

Results on simulated fMRI data, The z-scores of the different measures used to determine the directionality, computed over subjects and connections, are shown as violin plots (i.e., histograms rotated to be horizontal and made symmetric). If the directions are found completely correctly, the violin plots are concentrated at the top. The blue dots show the the percentage of correctly estimated directions. First, we have three pairwise methods, and for comparison, two methods by Patel, as well as ICA-based LiNGAM. Each panel is one simulation.

Figure 16

Figure 16

Results on simulated fMRI data, The z-scores of the different measures used to determine the directionality, computed over subjects and connections, are shown as violin plots (i.e., histograms rotated to be horizontal and made symmetric). If the directions are found completely correctly, the violin plots are concentrated at the top. The blue dots show the the percentage of correctly estimated directions. First, we have three pairwise methods, and for comparison, two methods by Patel, as well as ICA-based LiNGAM. Each panel is one simulation.

Similar articles

Cited by

References

    1. Buxton RB, Wong EC, Frank LR. Dynamics of blood flow and oxygenation changes during brain activati on: the balloon model. Magnetic Resonance in Medicine. 1998;39:855–864. - PubMed
    1. Chang C, Thomason ME, Glover GH. Mapping and correction of vascular hemodynamic latency in the BOLD signal. NeuroImage. 2008;43:90–102. - PMC - PubMed
    1. Chen Z, Chan L. Causal discovery for linear non-gaussian acyclic models in the presence of latent gaussian confounders. Proc Int Conf. on Latent Variable Analysis and Signal Separation; 2012. pp. 17–24.
    1. Comon P. Independent component analysis—a new concept? Signal Processing. 1994;36:287–314.
    1. Daniušis P, Janzing D, Mooij J, Zscheischler J, Steudel B, Zhang K, Schölkopf B. Inferring deterministic causal relations. Proc 26th Conference on Uncertainty in Artificial Intelligence (UAI2010); 2010.

LinkOut - more resources