Series (but not DataFrame) combine_first() loses timezone information · Issue #21469 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

dts1 = pd.date_range('20150101','20150105',tz='UTC')
df1 = pd.DataFrame({'DATE':dts1})
dts2 = pd.date_range('20150103','20150105',tz='UTC')
df2 = pd.DataFrame({'DATE':dts2})
df = df1.combine_first(df2)
df.DATE[0].tz # returns, 10567 fixed

ser = df1['DATE'].combine_first(df2['DATE']) ser[0].tz # returns None, should be as above

Problem description

Calling Series.combine_first on two tz-localized datetime Series returns a non-localized Series.

#10567 handled the case when running DataFrame.combine_first on DataFrames with datetime tz columns. Oddly, this does not work for Series. This behavior is the same under at least both 0.19.2 and latest master so it appears it may never have been fixed with #10567.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: 576d5c6python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.24.0.dev0+103.g576d5c6b7
pytest: 3.6.0
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.6
IPython: 6.4.0
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.8
pymysql: 0.8.1
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None