BUG/PERF: MultiIndex.value_counts returning flat index by lukemanley · Pull Request #49558 · pandas-dev/pandas (original) (raw)
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Added type annotations to new arguments/methods/functions.
- Added an entry in the latest
doc/source/whatsnew/v2.0.0.rst
file if fixing a bug or adding a new feature.
I believe it is a bug that MultiIndex.value_counts
returns a Series
indexed by a flat index of tuples rather than a MultiIndex
. By returning a flat index, we lose nullable dtypes, index names, etc.
This PR changes MultiIndex.value_counts
to return a Series
indexed by a MultiIndex
.
This also has some perf improvements:
import pandas as pd
import numpy as np
mi = pd.MultiIndex.from_product([np.arange(1000), np.arange(1000)])
%timeit mi.value_counts()
974 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- main
182 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) <- PR