BUG/PERF: MultiIndex.value_counts returning flat index by lukemanley · Pull Request #49558 · pandas-dev/pandas (original) (raw)

I believe it is a bug that MultiIndex.value_counts returns a Series indexed by a flat index of tuples rather than a MultiIndex. By returning a flat index, we lose nullable dtypes, index names, etc.

This PR changes MultiIndex.value_counts to return a Series indexed by a MultiIndex.

This also has some perf improvements:

import pandas as pd
import numpy as np

mi = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])

%timeit mi.value_counts()

974 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <- main
182 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) <- PR