0.23.4 changed read_csv parsing for a mixed-timezone datetimes · Issue #24987 · pandas-dev/pandas (original) (raw)
Previously, a column in a CSV with mixed timezones would (I think) convert each value to UTC and discard the tzinfo.
import pandas as pd import io
content = """
Sat, 22 Apr 2017 15:11:58 -0500
Fri, 21 Apr 2017 14:20:57 -0500
Thu, 9 Mar 2017 11:15:30 -0600"""
df = pd.read_csv(io.StringIO(content), parse_dates=True, header=None, names=['day', 'datetime'], index_col='datetime')
On 0.23.4 that's
In [7]: df.index Out[7]: DatetimeIndex(['2017-04-22 20:11:58', '2017-04-21 19:20:57', '2017-03-09 17:15:30'], dtype='datetime64[ns]', name='datetime', freq=None)
On 0.24 that's
In [7]: df.index Out[7]: Index([2017-04-22 15:11:58-05:00, 2017-04-21 14:20:57-05:00, 2017-03-09 11:15:30-06:00], dtype='object', name='datetime')
I'm not sure what the expected behavior is here, but I think the old behavior is as good as any.
I haven't verified, but #22380 seems like a likely candidate for introducing the change.