merge_asof fails when using left_index/right_index with a tolerance parameter · Issue #15135 · pandas-dev/pandas (original) (raw)

Problem description

Using the DataFrames trades and quotes from the merge_asof documentation as an example, there's an issue when using merge_asof to merge on indexes using left_index/right_index (rather than column names) and additionally specifying a tolerance parameter.

Admittedly, the left_index/right_index parameters are not documented for this function (in the prototype description), but they are still valid parameters (and have an example in the docs). It could be seen as inconsistent that they sometimes work as expected and sometimes raise a cryptic error message.

For example, first set the 'time' column of each DataFrame as the index:

quotes = quotes.set_index('time') trades = trades.set_index('time')

Then pd.merge_asof(trades, quotes, left_index=True, right_index=True, by='ticker') works exactly as pd.asof_merge(trades, quotes, on='time', by='ticker') (as expected).

However, pd.merge_asof(trades, quotes, left_index=True, right_index=True, by='ticker', tolerance=pd.Timedelta('2ms')) (specifying a tolerance) raises an error:

/.../anaconda3/lib/python3.4/site-packages/pandas/tools/merge.py in _get_merge_keys(self)
   1147 
   1148             else:
-> 1149                 raise MergeError("key must be integer or timestamp")
   1150 
   1151         # validate allow_exact_matches

MergeError: key must be integer or timestamp

I would have expected it work exactly as pd.merge_asof(trades, quotes, on='time', by='ticker', tolerance=pd.Timedelta('2ms')) works. Failing this, an error telling me that I cannot use a tolerance when merging on indexes would be welcome.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.19.2
nose: 1.3.4
pip: 9.0.1
setuptools: 31.0.1.post20161215
Cython: 0.23.4
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 5.1.0
sphinx: 1.2.3
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.6.7
lxml: 3.6.4
bs4: 4.3.2
html5lib: 0.99999
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.36.0
pandas_datareader: None