BUG: groupby.describe with as_index=False incorrect · Issue #49256 · pandas-dev/pandas (original) (raw)

df = DataFrame(
    [[1, 2, "foo"], [1, np.nan, "bar"], [3, np.nan, "baz"]],
    columns=["A", "B", "C"],
)
gb = df.groupby("A", as_index=False)
print(gb.describe())

#       A                                ...    B                             
#   count mean  std  min  25%  50%  75%  ... mean std  min  25%  50%  75%  max
# 0   2.0  1.0  0.0  1.0  1.0  1.0  1.0  ...  2.0 NaN  2.0  2.0  2.0  2.0  2.0
# 1   1.0  3.0  NaN  3.0  3.0  3.0  3.0  ...  NaN NaN  NaN  NaN  NaN  NaN  NaN

Instead, column A should just have the groupers

   A     B                                  
     count mean std  min  25%  50%  75%  max
0  1   1.0  2.0 NaN  2.0  2.0  2.0  2.0  2.0
1  3   0.0  NaN NaN  NaN  NaN  NaN  NaN  NaN

We get different, but also incorrect, output for SeriesGroupBy:

df = DataFrame(
    {
        "foo1": ["one", "two", "two", "three", "one", "two"],
        "foo2": [1, 2, 4, 4, 5, 6],
    }
)
gb = df.groupby("foo1", as_index=False)['foo2']
print(gb.describe())
# foo1   0         one
#        1       three
#        2         two
# count  0         2.0
#        1         1.0
#        2         3.0
# mean   0         3.0
#        1         4.0
#        2         4.0
# std    0    2.828427
#        1         NaN
#        2         2.0
# min    0         1.0
#        1         4.0
#        2         2.0
# 25%    0         2.0
#        1         4.0
#        2         3.0
# 50%    0         3.0
#        1         4.0
#        2         4.0
# 75%    0         4.0
#        1         4.0
#        2         5.0
# max    0         5.0
#        1         4.0
#        2         6.0
# dtype: object