AmbiguousTimeError merging two timezone-aware DataFrames with DST change · Issue #18885 · pandas-dev/pandas (original) (raw)

When merging two DataFrames by a timezone-aware datetime column, if the datetime values doesn't include a DST change, there's no problem:

df1 = pd.DataFrame([pd.to_datetime('2017-10-30 02:00:00+01:00'), pd.to_datetime('2017-10-30 03:00:00+01:00'), pd.to_datetime('2017-10-30 04:00:00+01:00')],columns=['date']) df1['date'] = df1['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid') df1['value'] = 1

df2 = pd.DataFrame([pd.to_datetime('2017-10-30 04:00:00+01:00'), pd.to_datetime('2017-10-30 05:00:00+01:00'), pd.to_datetime('2017-10-30 06:00:00+01:00')],columns=['date']) df2['date'] = df2['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid') df2['value'] = 2

pd.merge(df1, df2, how='outer', on='date')

Output

                       date  value_x  value_y
0 2017-10-30 02:00:00+01:00      1.0      NaN
1 2017-10-30 03:00:00+01:00      1.0      NaN
2 2017-10-30 04:00:00+01:00      1.0      2.0
3 2017-10-30 05:00:00+01:00      NaN      2.0
4 2017-10-30 06:00:00+01:00      NaN      2.0

This is correct. But if the datetime values include a date with DST change, we get an AmbiguousTimeError exception:

df1 = pd.DataFrame([pd.to_datetime('2017-10-29 02:00:00+02:00'), pd.to_datetime('2017-10-29 02:00:00+01:00'), pd.to_datetime('2017-10-29 03:00:00+01:00')],columns=['date']) df1['date'] = df1['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid') df1['value'] = 1

df2 = pd.DataFrame([pd.to_datetime('2017-10-29 03:00:00+01:00'), pd.to_datetime('2017-10-29 04:00:00+01:00'), pd.to_datetime('2017-10-29 05:00:00+01:00')],columns=['date']) df2['date'] = df2['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid') df2['value'] = 2

pd.merge(df1, df2, how='outer', on='date')

Expected output

                       date  value_x   value_y
0 2017-10-29 02:00:00+02:00      1.0       NaN
1 2017-10-29 02:00:00+01:00      1.0       NaN
2 2017-10-29 03:00:00+01:00      1.0       2.0
3 2017-10-29 04:00:00+01:00      NaN       2.0
4 2017-10-29 05:00:00+01:00      NaN       2.0

Actual output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "(...)/venv/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 58, in merge
    return op.get_result()
  File "(...)//venv/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 604, in get_result
    self._maybe_add_join_keys(result, left_indexer, right_indexer)
  File "(...)//venv/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 715, in _maybe_add_join_keys
    key_col = Index(lvals).where(~mask, rvals)
  File "(...)//venv/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py", line 809, in where
    result = self._ensure_localized(result)
  File "(...)//venv/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py", line 230, in _ensure_localized
    result = result.tz_localize(self.tz)
  File "(...)//venv/lib/python3.6/site-packages/pandas/util/_decorators.py", line 118, in wrapper
    return func(*args, **kwargs)
  File "(...)//venv/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py", line 1858, in tz_localize
    errors=errors)
  File "pandas/_libs/tslib.pyx", line 3593, in pandas._libs.tslib.tz_localize_to_utc
pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from Timestamp('2017-10-29 02:00:00'), try using the 'ambiguous' argument

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: es_ES.UTF-8
LOCALE: es_ES.UTF-8

pandas: 0.21.1
pytest: 3.2.5
pip: 9.0.1
setuptools: 36.8.0
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: 1.5.3
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None