BUG: iterator of DatetimeIndex broken with tzoffset timezone · Issue #8890 · pandas-dev/pandas (original) (raw)
Summary:
The trigger is the tzoffset
timezone. This bug can be reproduced as follows:
In [86]: index = pd.date_range("2012-01-01", periods=3, freq='H', tz=dateutil.tz.tzoffset(None, -28800))
In [87]: index
Out[87]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-01 00:00:00-08:00, ..., 2012-01-01 02:00:00-08:00]
Length: 3, Freq: H, Timezone: tzoffset(None, -28800)
In [88]: index[0]
Out[88]: Timestamp('2012-01-01 00:00:00-0800', tz='tzoffset(None, -28800)', offset='H')
In [90]: list(iter(index))[0]
Out[90]: Timestamp('2011-12-31 16:00:00-0800', tz='tzoffset(None, -28800)', offset='H')
In [91]: list(iter(index))[0] == index[0]
Out[91]: False
In 0.14 this last comparison gives True.
This appears in iterating over the index (for time in index: ...
) or with using DataFrame.iterrows()
(#8951).
Original report:
I see there are a number of issues related to datetime and timeindex here in issues, and I suspect that mine has a lot in common with them. The cure for one of them will probably solve all of them. So here it goes.
My code was using Pandas 0.13.1 without an issue. I recently upgraded to 0.15.1
This is where my code acts unexpectedly:
time_points = df.index[df['candidate'] == 1]
for time in time_points:
[...]
The index is in US/Pacific timezone. When the for loop returns time
, it is still in US/Pacific timezone but with an added 8h to the time. So, while the actual time is 2014-11-23 23:25:02.916000-08:00
, time
is set to 2014-11-23 15:49:12.972000-08:00
.
I have made sure that the type of index is pandas.Timestamp
and I can't find an elegant workaround that would ensure running it in both versions of Pandas.
Any thoughts on this?