AmbiguousTimeError on groupby when including a DST change · Issue #14682 · pandas-dev/pandas (original) (raw)
A small, complete example of the issue
#!/usr/bin/env python import pandas as pd df=pd.DataFrame([1477786980,1477790580],columns=['ts']) df['date']=pd.to_datetime(df.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid') df.set_index('date', inplace=True)
dfo = df.groupby(pd.TimeGrouper('5min'))
Expected Output
ts
date
2016-10-30 02:20:00+02:00 1
2016-10-30 02:25:00+02:00 0
2016-10-30 02:30:00+02:00 0
2016-10-30 02:35:00+02:00 0
2016-10-30 02:40:00+02:00 0
2016-10-30 02:45:00+02:00 0
2016-10-30 02:50:00+02:00 0
2016-10-30 02:55:00+02:00 0
2016-10-30 02:00:00+01:00 0
2016-10-30 02:05:00+01:00 0
2016-10-30 02:10:00+01:00 0
2016-10-30 02:15:00+01:00 0
2016-10-30 02:20:00+01:00 1
Output of pd.show_versions()
# Paste the output here pd.show_versions() here >>> pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-47-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 28.6.1
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: 1.4.8
patsy: None
dateutil: 2.4.2
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
matplotlib: None
openpyxl: 2.2.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
The above code raises an AmbiguousTimeError
exception, when grouping by a time-date series including a DST change. In the above example the unix timestamps are for the recent DST change in Europe.
The stack trace is:
Traceback (most recent call last):
File "./t.py", line 7, in <module>
dfo = df.groupby(pd.TimeGrouper('5min'))
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3984, in groupby
**kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1501, in groupby
return klass(obj, by, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 370, in __init__
mutated=self.mutated)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 2382, in _get_grouper
binner, grouper, obj = key._get_grouper(obj)
File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1062, in _get_grouper
r._set_binner()
File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 237, in _set_binner
self.binner, self.grouper = self._get_binner()
File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 245, in _get_binner
binner, bins, binlabels = self._get_binner_for_time()
File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 660, in _get_binner_for_time
return self.groupby._get_time_bins(self.ax)
File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1118, in _get_time_bins
base=self.base)
File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1262, in _get_range_edges
closed=closed, base=base)
File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1326, in _adjust_dates_anchored
return (Timestamp(fresult).tz_localize(first_tzinfo),
File "pandas/tslib.pyx", line 621, in pandas.tslib.Timestamp.tz_localize (pandas/tslib.c:13694)
File "pandas/tslib.pyx", line 4308, in pandas.tslib.tz_localize_to_utc (pandas/tslib.c:74816)
pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from Timestamp('2016-10-30 02:20:00'), try using the 'ambiguous' argument
Code works if the series does not include a DST change (e.g. one day earlier):
#!/usr/bin/env python import pandas as pd df=pd.DataFrame([1477700580,1477704180],columns=['ts']) df['date']=pd.to_datetime(df.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid') df.set_index('date', inplace=True)
dfo = df.groupby(pd.TimeGrouper('5min'))
print dfo.count()
gets:
ts
date
2016-10-29 02:20:00+02:00 1
2016-10-29 02:25:00+02:00 0
2016-10-29 02:30:00+02:00 0
2016-10-29 02:35:00+02:00 0
2016-10-29 02:40:00+02:00 0
2016-10-29 02:45:00+02:00 0
2016-10-29 02:50:00+02:00 0
2016-10-29 02:55:00+02:00 0
2016-10-29 03:00:00+02:00 0
2016-10-29 03:05:00+02:00 0
2016-10-29 03:10:00+02:00 0
2016-10-29 03:15:00+02:00 0
2016-10-29 03:20:00+02:00 1