Unexpected dtype when using .loc to set Categorical value for column in 1-row DataFrame · Issue #25495 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'Alpha': [u'a'], 'Numeric': [0]})
In [3]: df.loc[:,'Alpha'] Out[3]: 0 a Name: Alpha, dtype: object
In [4]: codes = pd.Categorical(df['Alpha'], categories = [u'a',u'b',u'c'])
In [5]: codes Out[5]: [a] Categories (3, object): [a, b, c]
In [6]: df.loc[:,'Alpha'] = codes
In [7]: df.loc[:,'Alpha'] Out[7]: 0 a Name: Alpha, dtype: object
Problem description
When I try to set the column of a one-row DataFrame to a pandas.core.arrays.categorical.Categorical
, it is returned as a pandas.core.series.Series
of dtype('O')
rather than a pandas.core.series.Series
of CategoricalDtype(categories=[u'a', u'b', u'c'], ordered=False)
. I get the latter return value when I set the column using df['Alpha'] = codes
or df.Alpha = codes
. I can't replicate this inconsistency with DataFrames containing more than one row.
Expected Output
Out[7]: 0 a Name: Alpha, dtype: category Categories (3, object): [a, b, c]
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None