BUG: Time zone information lost for some dateutil time zones · Issue #9663 · pandas-dev/pandas (original) (raw)

The dateutil package allows you to create time zone (tzfile) objects two ways, either by using dateutil.tz.gettz to read time zone data on the file system (/usr/share/zoneinfo), or by using dateutil.zoneinfo.gettz to read time zone data from a tar file distributed in the dateutil package.

The tslib.maybe_get_tz function doesn't handle the dateutil.tz.gettz variant.

from datetime import datetime

import pandas as pd
import pandas.tslib as tslib

import dateutil.tz
import dateutil.zoneinfo

tz1 = dateutil.tz.gettz('America/New_York')
tz2 = dateutil.zoneinfo.gettz('America/New_York')

d1 = datetime(2015, 1, 1, tzinfo=tz1)
d2 = datetime(2015, 1, 1, tzinfo=tz2)

maybe_get_tz returns None for tz1, but works correctly for tz2:

>>> tslib.maybe_get_tz('dateutil/' + tz1._filename)
>>> tslib.maybe_get_tz('dateutil/' + tz2._filename)
tzfile('America/New_York')

And so DatetimeIndexes are missing time zone information for those cases.

>>> pd.to_datetime([d1])
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-01-01 05:00:00]
Length: 1, Freq: None, Timezone: None
>>> pd.to_datetime([d2])
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-01-01 00:00:00-05:00]
Length: 1, Freq: None, Timezone: tzfile('America/New_York')

I think if maybe_get_tz where to first try dateutil.zoneinfo.gettz, and then fall back on dateutil.tz.gettz, then the problem is solved.

This was a regression between 0.14 and 0.15.