DEPR: groupby.grouper by rhshadrach · Pull Request #56521 · pandas-dev/pandas (original) (raw)

Using result_index is not currently reliable:

df = pd.DataFrame(
    {
        "a": pd.Categorical([1], categories=[1, 2]),
        "b": pd.Categorical([1], categories=[1, 2]),
        "c": 5,
    }
)
gb = df.groupby(["a", "b"])
print(gb.grouper.result_index)
# MultiIndex([(1, 1)], names=['a', 'b'])
print(gb.size())
# a  b
# 1  1    1
#    2    0
# 2  1    0
#    2    0
# dtype: int64

though #55738 will fix this in all cases I know of. An alternative is to perform a reduction and take the index from the result. I recommend size for this as it doesn't require as much computation as other methods.

size = 100_000
df = pd.DataFrame({'a': np.random.randint(0, 100, size), 'b': np.random.randint(0, 100, size), 'c': np.random.random(size)})
%timeit df.groupby(['a', 'b']).size().index
# 3.68 ms ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.groupby(['a', 'b'])._grouper.result_index
# 3.36 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

We do not need to deprecate grouper for #55738, only certain grouper attributes (see #56149)