Add notion of pytz time zone normalization · Issue #18595 · pandas-dev/pandas (original) (raw)
Though part of the solution to #18523 will be to introduce a new time zone comparison operation, there's another, less pressing issue that's also brought up there:
import pandas as pd
idx1 = pd.date_range('2011-01-01', periods=3, freq='H', tz='Europe/Paris') idx2 = pd.date_range(start=idx1[0], end=idx1[-1], freq='H')
print(idx1.tz) # <DstTzInfo 'Europe/Paris' LMT+0:09:00 STD> print(idx2.tz) # <DstTzInfo 'Europe/Paris' CET+1:00:00 STD>
The problem comes from the fact that pytz.localize
spawns concrete offset-specific new tzinfo
objects for each date that don't compare equal to one another (and aren't the same instance as one another). As such, when you pull idx1[0]
and idx1[-1]
, you get concrete timestamps, each of which have the <DstTzInfo 'Europe/Paris' CET+1:00:00 STD>
zone attached and so the DateTimeIndex
gets the concrete example, which is not actually consistent with the abstract notion of time zone from idx1
. At the series level, it makes sense to use a "normalized" version of pytz
zones, so I recommend adding a function like this:
def tz_normalize(tzi): if is_pytz_zone(tzi): return pytz.timezone(str(tzi)) else: return tzi
You can solve the above problem by either normalizing the time zone on construction or you can change .tz
from being an attribute to being a property:
@property def tz(self): return tz_normalize(self._tz)
@tz.setter def tz(self, value): msg = ('Directly setting DatetimeIndex is deprecated, instead, you should ' 'use tz_localize() or tz_convert() as appropriate') raise_deprecation_warning(msg) self._tz = value
You'll note that in my property implementation, I've demonstrated that this will also allow you to enforce that people use the proper interface for changing index time zones.
Issue raised per discussion with @jreback and @MridulS at PyData NYC sprints.