tz info is lost in min/max of single column data frame · Issue #24465 · pandas-dev/pandas (original) (raw)
Code Sample
import pandas as pd
creating data frame
df = pd.DataFrame({"date": ['25-12-2018', '31-12-2018']})
turning the date string into a tz-aware datetime object
df["date"] = pd.to_datetime(df["date"], utc=True)
we have indeed a column with tz-aware datetimes
print(df.dtypes)
date datetime64[ns, UTC]
dtype: object
when finding the maximum datetime, the tz-information is lost (which is a bug I think):
last_date1 = df.max() print(last_date1)
date 2018-12-31
dtype: datetime64[ns]
the tz-information is correctly retained when finding the maximum as follows:
last_date2 = df["date"].max() print(last_date2)
2018-12-31 00:00:00+00:00
Problem description
When trying to find the maximum in a single-column data-frame with tz-aware datetimes, I would expect to get a tz-aware result. The tz info is however lost. I think that is a bug. The tz info should be retained.
When trying to find the maximum through explicitly mentioning the column (i.e. through a Series object), the error does not appear, and a tz-aware result is returned.
Expected Output
last_date1 = df.max() print(last_date1)
date 2018-12-31 00:00:00+00:00
dtype: datetime64[ns, UTC]
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None