DataFrame.corr(method="kendall") calculation is slow (original) (raw)

import numpy as np import pandas as pd

df = pd.DataFrame(np.random.randn(1000, 300))

df.corr(method="kendall")

21.6 s ± 686 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

DataFrame.corr(method="kendall") doesn't scale particularly well, perhaps because it's the only named correlation method that isn't Cythonized at the moment (we just call kendalltau from scipy repeatedly in a Python for loop: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7454). It may be worthwhile to try to implement something more efficient within _libs/algos.pyx.

Relevant discussion: #28151