Timezones silently dropped in parsing · Issue #18702 · pandas-dev/pandas (original) (raw)

TLDR: pandas should pass a tzinfos kwarg to the dateutil parser using sensible defaults.

dateutil has a bug that silently drops most timezones. That bug is inherited by pandas. The following is run on a machine located in US/Pacific:

>>> pd.Timestamp('2017-12-08 08:20 PM PST')     # <-- only parsed correctly because of locale
Timestamp('2017-12-08 20:20:00-0800', tz='tzlocal()')
>>> pd.Timestamp('2017-12-08 08:20 PM EST')     # <-- timezone silently dropped
Timestamp('2017-12-08 20:20:00')

There is a partial fix in progress over at dateutil, the most likely outcome of which is that these cases will raise in the future unless a tzinfos kwarg is explicitly passed to dateutil.parser.parse. The issue for pandas is then to decide on what tzinfos to pass (a suggestion to handle the most common use cases by default within dateutil went nowhere).

The tzinfos kwarg is a dictionary taking a string and returning a tzinfo object, e.g.

unambiguous_tzinfos = {
    'PDT': dateutil.tz.gettz('US/Pacific'),
    'PT': dateutil.tz.gettz('US/Pacific'),
    'MDT': dateutil.tz.gettz('US/Mountain'),
    'MT': dateutil.tz.gettz('US/Mountain'),
    'ET': dateutil.tz.gettz('US/Eastern'),
    'CET': dateutil.tz.gettz('Europe/Amsterdam),
    'NZDT': dateutil.tz.gettz('Pacific/Auckland')}

This example includes only abbreviations for which there are no other alternatives listed here. So e.g. "CST" is excluded since it could also be "China Standard Time", "EST" is excluded since it could refer to "Australian Eastern Standard Time". Note this is only a subset of the unambiguous abbreviations.