ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients (original) (raw)
The R package ppcor provides users with four functions which are pcor(), pcor.test(), spcor(), and spcor.test(). The function pcor() ( spcor()) calculates the partial (semi-partial) correlations of all pairs of two random variables of a matrix or a data frame and provides the matrices of statistics and p-values of each pairwise partial (semi-partial) correlation. In order to compute the pairwise partial (semi-partial) correlation coefficient of a pair of two random variables given one or more random variables, pcor.test() ( spcor.test()) can be also used instead. We can see how to use these functions through the following examples. First the test data, y.data, need to be created after loading the package with the following R codes.
R> library(ppcor) R> y.data <- data.frame( + hl = c(7,15,19,15,21,22,57,15,20,18), + disp = c(0,0.964,0,0,0.921,0,0,1.006,0,1.011), + deg = c(9,2,3,4,1,3,1,3,6,1), + BC = c(1.78e-02,1.05e-06,1.37e-05,7.18e-03,0,0,0,4.48e-03,2.10e-06,0) +)
This test data, y.data, consists of 10 samples from four variables, hl, disp, deg, and BC. This data set is available from Drummond et al. (2006) and Kim and Yi (2007). The original data cover the relationship between sequence and functional evolutions in yeast proteins. Here we look at only part of the large data for the illustrative purpose. Note that hl, disp, deg, and BC stand for half life, dispensability, degree, and betweenness-centrality, respectively. Please refer to Drummond et al. (2006) and Kim and Yi (2007) for more details.
We can then calculate all pairwise partial correlations of each pair of two variables given other variables with
R> pcor(x=y.data,method="spearman")
Then we obtain the following output:
estimatehldispdegBChl1.0000000−0.7647345−0.1367596−0.7860646disp−0.76473451.0000000−0.4845966−0.4506273deg−0.1367596−0.48459661.00000000.4010940BC−0.7860646−0.45062730.40109401.0000000estimate hl disp deg BC hl 1.0000000 −0.7647345 −0.1367596 −0.7860646 disp −0.7647345 1.0000000 −0.4845966 −0.4506273 deg −0.1367596 −0.4845966 1.0000000 0.4010940 BC −0.7860646 −0.4506273 0.4010940 1.0000000 estimatehldispdegBChl1.0000000−0.7647345−0.1367596−0.7860646disp−0.76473451.0000000−0.4845966−0.4506273deg−0.1367596−0.48459661.00000000.4010940BC−0.7860646−0.45062730.40109401.0000000p.value hl disp deg BC hl 0.00000000 0.02708081 0.7467551 0.02071908 disp 0.02708081 0.00000000 0.2236095 0.26248897 deg 0.74675508 0.22360945 0.0000000 0.32471409 BC 0.02071908 0.26248897 0.3247141 0.00000000 statistichldispdegBChl0.0000000−2.907150−0.3381686−3.114899disp−2.90715010.000000−1.3569947−1.236464deg−0.3381686−1.3569950.00000001.072529BC−3.1148991−1.2364641.07252860.000000statistic hl disp deg BC hl 0.0000000 −2.907150 −0.3381686 −3.114899 disp −2.9071501 0.000000 −1.3569947 −1.236464 deg −0.3381686 −1.356995 0.0000000 1.072529 BC −3.1148991 −1.236464 1.0725286 0.000000 statistichldispdegBChl0.0000000−2.907150−0.3381686−3.114899disp−2.90715010.000000−1.3569947−1.236464deg−0.3381686−1.3569950.00000001.072529BC−3.1148991−1.2364641.07252860.000000n [1] 10 gp[1]2gp [1] 2 gp[1]2method [1] "spearman”
The output has six values, estimate, which is the partial correlation coefficient, p-value, which is the level of statistical significance, statistic, which is the test statistic for p-value, n, which is the total number of samples, gp, which is the number of given or controlled variables, and method, which is the used correlation method among Pearson’s, Kendall’s, and Spearman’s correlation methods. In case that the users are interested in the partial correlation between hl and disp given deg and BC, we can compute the partial correlation with
R> pcor.test(x=y.data$hl,y=y.data$disp,z=y.data[,c("deg","BC")] +, method="spearman")
Then we obtain the following output:
estimate p.value statistic n gp Method 1 −0.7647345 0.02708081 −2.90715 10 2 spearman
Similarly, the semi-partial correlations can be calculated with
R> spcor(x=y.data,method="spearman")
Then we obtain the following output:
estimatehldispdegBChl1.00000000−0.4254609−0.04949092−0.4558649disp−0.593194491.0000000−0.27689034−0.2522965deg−0.06380762−0.25604571.000000000.2023709BC−0.42262366−0.16776120.145518661.0000000estimate hl disp deg BC hl 1.00000000 −0.4254609 −0.04949092 −0.4558649 disp −0.59319449 1.0000000 −0.27689034 −0.2522965 deg −0.06380762 −0.2560457 1.00000000 0.2023709 BC −0.42262366 −0.1677612 0.14551866 1.0000000 estimatehldispdegBChl1.00000000−0.4254609−0.04949092−0.4558649disp−0.593194491.0000000−0.27689034−0.2522965deg−0.06380762−0.25604571.000000000.2023709BC−0.42262366−0.16776120.145518661.0000000p.value hl disp deg BC hl 0.0000000 0.2933025 0.9073559 0.2562889 disp 0.1211334 0.0000000 0.5067562 0.5466351 deg 0.8806850 0.5404845 0.0000000 0.6307871 BC 0.2968811 0.6912998 0.7309799 0.0000000 statistichldispdegBChl0.0000000−1.1515898−0.1213762−1.2545787disp−1.80486580.0000000−0.7058372−0.6386584deg−0.1566153−0.64880950.00000000.5061789BC−1.1422336−0.41683680.36028150.0000000statistic hl disp deg BC hl 0.0000000 −1.1515898 −0.1213762 −1.2545787 disp −1.8048658 0.0000000 −0.7058372 −0.6386584 deg −0.1566153 −0.6488095 0.0000000 0.5061789 BC −1.1422336 −0.4168368 0.3602815 0.0000000 statistichldispdegBChl0.0000000−1.1515898−0.1213762−1.2545787disp−1.80486580.0000000−0.7058372−0.6386584deg−0.1566153−0.64880950.00000000.5061789BC−1.1422336−0.41683680.36028150.0000000n [1] 10 gp[1]2gp [1] 2 gp[1]2method [1] "spearman”
The semi-partial correlation of hl with disp given deg and BC is calculated with
R> spcor.test(x=y.data$hl,y=y.data$disp,z=y.data[,c("deg","BC")] +, method="spearman")
Then we obtain the following output:
estimate p.value statistic n gp Method 1 -0.4254609 0.2933025 -1.15159 10 2 spearman
It should be noted that, if a general matrix formula for the semi-partial correlation is not available, users have to calculate all pairs of each variable with the function spcor.test using two loops. To see how fast the general matrix formula can compute the semi-partial correlation, we compared the computational time by generating a data matrix with the size of 500 × 100 (i.e., the number of variables is 100 and the number of samples 500). When the function spcor() used, the total amount of computation time was 0.02 second, while it took 135.33 second when the function spcor.test() used with two loops. It demonstrates that the general matrix formula dramatically reduce the computational burden of the higher-order semi-partial correlation calculation. Note that this simulation was implemented on a desktop with Intel Core 2 Duo CPU 3.00 GHz.