DEPR: groupby.grouper by rhshadrach · Pull Request #56521 · pandas-dev/pandas (original) (raw)
Using result_index
is not currently reliable:
df = pd.DataFrame(
{
"a": pd.Categorical([1], categories=[1, 2]),
"b": pd.Categorical([1], categories=[1, 2]),
"c": 5,
}
)
gb = df.groupby(["a", "b"])
print(gb.grouper.result_index)
# MultiIndex([(1, 1)], names=['a', 'b'])
print(gb.size())
# a b
# 1 1 1
# 2 0
# 2 1 0
# 2 0
# dtype: int64
though #55738 will fix this in all cases I know of. An alternative is to perform a reduction and take the index from the result. I recommend size
for this as it doesn't require as much computation as other methods.
size = 100_000
df = pd.DataFrame({'a': np.random.randint(0, 100, size), 'b': np.random.randint(0, 100, size), 'c': np.random.random(size)})
%timeit df.groupby(['a', 'b']).size().index
# 3.68 ms ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.groupby(['a', 'b'])._grouper.result_index
# 3.36 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
We do not need to deprecate grouper
for #55738, only certain grouper attributes (see #56149)