Lost timezone after groupby transform · Issue #24198 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({'end_time': [pd.to_datetime('now', utc=True).tz_convert('Asia/Singapore')], 'id': [1]})
df['max_end_time'] = df.groupby('id').end_time.transform(max)
df.info()
shows
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
end_time 1 non-null datetime64[ns, Asia/Singapore]
id 1 non-null int64
max_end_time 1 non-null datetime64[ns]
dtypes: datetime64[ns, Asia/Singapore](1), datetime64[ns](1), int64(1)
memory usage: 104.0 bytes
df.to_dict()
shows
{'end_time': {0: Timestamp('2018-12-10 17:08:52.630644+0800', tz='Asia/Singapore')}, 'id': {0: 1}, 'max_end_time': {0: Timestamp('2018-12-10 09:08:52.630644')}}
Problem description
The timezone is dropped silently and timestamp converted to UTC after groupby - transform operation on tz aware datetime column
Expected Output
assert df['end_time'] == df['max_end_time']
Output of pd.show_versions()
``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.1.final.0 python-bits: 64 OS: Linux OS-release: 4.9.85-38.58.amzn1.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.10.0
pip: 18.1
setuptools: 40.5.0
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.1
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.3
html5lib: None
sqlalchemy: 1.2.13
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None