Categorical dtype doesn't survive groupby of first, max, min, value_counts etc.: unwanted coercion to object · Issue #18502 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
Your code here
In [1]: df=pd.DataFrame(dict(payload=[-1,-2,-1,-2], col=pd.Categorical(["foo", "bar", "bar", "qux"], ordered=True)));df
Out[1]:
col payload
0 foo -1
1 bar -2
2 bar -1
3 qux -2
In [2]: df.groupby("payload").first().col.dtype
Out[2]: dtype('O')
Problem description
Grouping shouldn't coerce a categorical into object. Categorical dtypes should be preserved as long as possible for efficiency and correctness.
Expected Output
The result dtype should be CategoricalDtype(categories=['bar', 'foo', 'qux'], ordered=True), just like it is here:
In [6]: df.groupby("payload").head().col.dtype
Out[6]: CategoricalDtype(categories=['bar', 'foo', 'qux'], ordered=True
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-98-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0
pytest: 3.2.5
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.20.1post0
numpy: 1.13.3
scipy: 0.13.3
pyarrow: None
xarray: 0.9.6
IPython: 6.2.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.6.4
feather: None
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
sqlalchemy: 0.8.4
pymysql: None
psycopg2: None
jinja2: 2.7.2
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None
[paste the output of pd.show_versions()
here below this line]