pandas.DataFrame.corr — pandas 2.2.3 documentation (original) (raw)
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)[source]#
Compute pairwise correlation of columns, excluding NA/null values.
Parameters:
method{‘pearson’, ‘kendall’, ‘spearman’} or callable
Method of correlation:
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- callable: callable with input two 1d ndarrays
and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.
min_periodsint, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.
numeric_onlybool, default False
Include only float, int or boolean data.
Added in version 1.5.0.
Changed in version 2.0.0: The default value of numeric_only
is now False
.
Returns:
DataFrame
Correlation matrix.
Notes
Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.
- Pearson correlation coefficient
- Kendall rank correlation coefficient
- Spearman’s rank correlation coefficient
Examples
def histogram_intersection(a, b): ... v = np.minimum(a, b).sum().round(decimals=1) ... return v df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], ... columns=['dogs', 'cats']) df.corr(method=histogram_intersection) dogs cats dogs 1.0 0.3 cats 0.3 1.0
df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], ... columns=['dogs', 'cats']) df.corr(min_periods=3) dogs cats dogs 1.0 NaN cats NaN 1.0