Lost timezone information after groupby transform · Issue #27496 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Similar issue as #24198 in pandas 0.25.0

import pandas as pd

df = pd.DataFrame({'time': [pd.Timestamp('2010-07-15 03:14:45'), pd.Timestamp('2010-11-19 18:47:06')], 'timezone': ['Etc/GMT+4', 'US/Eastern']})

df['time_tz'] = (df.groupby(['timezone'])['time'] .transform(lambda x: x.dt.tz_localize(x.name, ambiguous='NaT')))

df['time_tz_desired'] = (df.groupby(['timezone'])['time'] .apply(lambda x: x.dt.tz_localize(x.name, ambiguous='NaT'))) print(df)

Problem description

The timezone in the transform should be preserved but it is not.
For the apply the datetime returned is an object type

df.info() shows

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
time               2 non-null datetime64[ns]
timezone           2 non-null object
time_tz            2 non-null datetime64[ns]
time_tz_desired    2 non-null object
dtypes: datetime64[ns](2), object(2)
memory usage: 192.0+ bytes

Expected Output

Should preserve timezone

df.groupby(['timezone'])['time'].transform(lambda x: x.dt.tz_localize(x.name, ambiguous='NaT'))

And should return datetime

df.groupby(['timezone'])['time'].apply(lambda x: x.dt.tz_localize(x.name, ambiguous='NaT'))

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.0
numpy : 1.16.3
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 40.8.0
Cython : 0.29.10
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : 2.8.2 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.0.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.3
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None