BUG: GroupBy.describe produces inconsistent results for empty datasets · Issue #41575 · pandas-dev/pandas (original) (raw)


In [1]: import pandas as pd

In [2]: df = pd.DataFrame(columns=['A', 'B', 'C'])
In [3]: df.groupby('A').B.describe()

ValueError Traceback (most recent call last) in ----> 1 df.groupby('A').B.describe()

~/.../python3.8/site-packages/pandas/core/groupby/generic.py in describe(self, **kwargs) 674 if self.axis == 1: 675 return result.T --> 676 return result.unstack() 677
678 def value_counts(

~/.../python3.8/site-packages/pandas/core/series.py in unstack(self, level, fill_value) 3827 from pandas.core.reshape.reshape import unstack 3828
-> 3829 return unstack(self, level, fill_value) 3830 3831 # ----------------------------------------------------------------------

~/.../python3.8/site-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value) 422 # Give nicer error messages when unstack a Series whose 423 # Index is not a MultiIndex. --> 424 raise ValueError( 425 f"index must be a MultiIndex to unstack, {type(obj.index)} was passed" 426 )

ValueError: index must be a MultiIndex to unstack, <class 'pandas.core.indexes.base.Index'> was passed

In [4]: df.groupby('A').describe() Out[4]: Series([], dtype: float64)

Problem description

SeriesGroupBy.describe raises an error when called on an empty dataset, and DataframeGroupBy.describe succeeds, but returns an empty Series.

Expected Output

I would expect both of these to return an empty DataFrame with the appropriate columns.

In [3]: df.groupby('A').B.describe() Out [3]: Empty DataFrame Columns: [count, mean, std, min, 25%, 50%, 75%, max] Index: []

In [4]: df.groupby('A').describe() Out [4]: Empty DataFrame Columns: [(B, count), (B, mean), (B, std), (B, min), (B, 25%), (B, 50%), (B, 75%), (B, max)(C, count), (C, mean), (C, std), (C, min), (C, 25%), (C, 50%), (C, 75%), (C, max)] Index: []

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 2cb9652python : 3.8.6.final.0 python-bits : 64 OS : Linux OS-release : 5.10.26-1rodete1-amd64 Version : #1 SMP Debian 5.10.26-1rodete1 (2021-04-12) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.2.1
Cython : 0.29.13
pytest : 4.6.11
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.20
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None