spearmanr — SciPy v1.15.3 Manual (original) (raw)
scipy.stats.
scipy.stats.spearmanr(a, b=None, axis=0, nan_policy='propagate', alternative='two-sided')[source]#
Calculate a Spearman correlation coefficient with associated p-value.
The Spearman rank-order correlation coefficient is a nonparametric measure of the monotonicity of the relationship between two datasets. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Spearman correlation at least as extreme as the one computed from these datasets. Although calculation of the p-value does not make strong assumptions about the distributions underlying the samples, it is only accurate for very large samples (>500 observations). For smaller sample sizes, consider a permutation test (see Examples section below).
Parameters:
a, b1D or 2D array_like, b is optional
One or two 1-D or 2-D arrays containing multiple variables and observations. When these are 1-D, each represents a vector of observations of a single variable. For the behavior in the 2-D case, see under axis
, below. Both arrays need to have the same length in the axis
dimension.
axisint or None, optional
If axis=0 (default), then each column represents a variable, with observations in the rows. If axis=1, the relationship is transposed: each row represents a variable, while the columns contain observations. If axis=None, then both arrays will be raveled.
nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional
Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):
- ‘propagate’: returns nan
- ‘raise’: throws an error
- ‘omit’: performs the calculations ignoring nan values
alternative{‘two-sided’, ‘less’, ‘greater’}, optional
Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available:
- ‘two-sided’: the correlation is nonzero
- ‘less’: the correlation is negative (less than zero)
- ‘greater’: the correlation is positive (greater than zero)
Added in version 1.7.0.
Returns:
resSignificanceResult
An object containing attributes:
statisticfloat or ndarray (2-D square)
Spearman correlation matrix or correlation coefficient (if only 2 variables are given as parameters). Correlation matrix is square with length equal to total number of variables (columns or rows) ina
and b
combined.
pvaluefloat
The p-value for a hypothesis test whose null hypothesis is that two samples have no ordinal correlation. See_alternative_ above for alternative hypotheses. pvalue has the same shape as statistic.
Raises:
ValueError
If axis is not 0, 1 or None, or if the number of dimensions of a_is greater than 2, or if b is None and the number of dimensions of_a is less than 2.
Warns:
Raised if an input is a constant array. The correlation coefficient is not defined in this case, so np.nan
is returned.
References
[1]
Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000. Section 14.7
[2]
Kendall, M. G. and Stuart, A. (1973). The Advanced Theory of Statistics, Volume 2: Inference and Relationship. Griffin. 1973. Section 31.18
Examples
import numpy as np from scipy import stats res = stats.spearmanr([1, 2, 3, 4, 5], [5, 6, 7, 8, 7]) res.statistic 0.8207826816681233 res.pvalue 0.08858700531354381
rng = np.random.default_rng() x2n = rng.standard_normal((100, 2)) y2n = rng.standard_normal((100, 2)) res = stats.spearmanr(x2n) res.statistic, res.pvalue (-0.07960396039603959, 0.4311168705769747)
res = stats.spearmanr(x2n[:, 0], x2n[:, 1]) res.statistic, res.pvalue (-0.07960396039603959, 0.4311168705769747)
res = stats.spearmanr(x2n, y2n) res.statistic array([[ 1. , -0.07960396, -0.08314431, 0.09662166], [-0.07960396, 1. , -0.14448245, 0.16738074], [-0.08314431, -0.14448245, 1. , 0.03234323], [ 0.09662166, 0.16738074, 0.03234323, 1. ]]) res.pvalue array([[0. , 0.43111687, 0.41084066, 0.33891628], [0.43111687, 0. , 0.15151618, 0.09600687], [0.41084066, 0.15151618, 0. , 0.74938561], [0.33891628, 0.09600687, 0.74938561, 0. ]])
res = stats.spearmanr(x2n.T, y2n.T, axis=1) res.statistic array([[ 1. , -0.07960396, -0.08314431, 0.09662166], [-0.07960396, 1. , -0.14448245, 0.16738074], [-0.08314431, -0.14448245, 1. , 0.03234323], [ 0.09662166, 0.16738074, 0.03234323, 1. ]])
res = stats.spearmanr(x2n, y2n, axis=None) res.statistic, res.pvalue (0.044981624540613524, 0.5270803651336189)
res = stats.spearmanr(x2n.ravel(), y2n.ravel()) res.statistic, res.pvalue (0.044981624540613524, 0.5270803651336189)
rng = np.random.default_rng() xint = rng.integers(10, size=(100, 2)) res = stats.spearmanr(xint) res.statistic, res.pvalue (0.09800224850707953, 0.3320271757932076)
For small samples, consider performing a permutation test instead of relying on the asymptotic p-value. Note that to calculate the null distribution of the statistic (for all possibly pairings between observations in sample x
and y
), only one of the two inputs needs to be permuted.
x = [1.76405235, 0.40015721, 0.97873798, ... 2.2408932, 1.86755799, -0.97727788] y = [2.71414076, 0.2488, 0.87551913, ... 2.6514917, 2.01160156, 0.47699563]
def statistic(x): # permute only
x
... return stats.spearmanr(x, y).statistic res_exact = stats.permutation_test((x,), statistic, ... permutation_type='pairings') res_asymptotic = stats.spearmanr(x, y) res_exact.pvalue, res_asymptotic.pvalue # asymptotic pvalue is too low (0.10277777777777777, 0.07239650145772594)
For a more detailed example, see Spearman correlation coefficient.