series.dt.tz_localize()
on Categorical operates on categories, not values · Issue #27952 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
datetimes = pd.Series(['2019-01-01', '2019-01-01', '2019-01-02'], dtype='datetime64[ns]') categorical = datetimes.astype('category') categorical.dt.tz_localize(None)
Produces:
0 2019-01-01
1 2019-01-02
dtype: datetime64[ns]
Problem description
.dt.tz_localize()
is operating on categorical.cat.categories
. It should be operating on categorical.astype('datetime64[ns]').values
. This is just plain wrong.
Expected Output
According to Categorical docs, "The returned Series (or DataFrame) is of the same type as if you used the .str. / .dt. on a Series of that type (and not of type category!).". So I think the expected value to be:
>>> datetimes.dt.tz_localize(None)
0 2019-01-01
1 2019-01-01
2 2019-01-02
dtype: datetime64[ns]
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit : None python : 3.7.2.final.0 python-bits : 64 OS : Linux OS-release : 5.2.8-200.fc30.x86_64 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 0.25.0
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.2
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.0
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.3.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None