merge_asof fails when using left_index/right_index with a tolerance parameter · Issue #15135 · pandas-dev/pandas (original) (raw)
Problem description
Using the DataFrames trades and quotes from the merge_asof
documentation as an example, there's an issue when using merge_asof
to merge on indexes using left_index
/right_index
(rather than column names) and additionally specifying a tolerance parameter.
Admittedly, the left_index
/right_index
parameters are not documented for this function (in the prototype description), but they are still valid parameters (and have an example in the docs). It could be seen as inconsistent that they sometimes work as expected and sometimes raise a cryptic error message.
For example, first set the 'time' column of each DataFrame as the index:
quotes = quotes.set_index('time') trades = trades.set_index('time')
Then pd.merge_asof(trades, quotes, left_index=True, right_index=True, by='ticker')
works exactly as pd.asof_merge(trades, quotes, on='time', by='ticker')
(as expected).
However, pd.merge_asof(trades, quotes, left_index=True, right_index=True, by='ticker', tolerance=pd.Timedelta('2ms'))
(specifying a tolerance) raises an error:
/.../anaconda3/lib/python3.4/site-packages/pandas/tools/merge.py in _get_merge_keys(self)
1147
1148 else:
-> 1149 raise MergeError("key must be integer or timestamp")
1150
1151 # validate allow_exact_matches
MergeError: key must be integer or timestamp
I would have expected it work exactly as pd.merge_asof(trades, quotes, on='time', by='ticker', tolerance=pd.Timedelta('2ms'))
works. Failing this, an error telling me that I cannot use a tolerance when merging on indexes would be welcome.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.19.2
nose: 1.3.4
pip: 9.0.1
setuptools: 31.0.1.post20161215
Cython: 0.23.4
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 5.1.0
sphinx: 1.2.3
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.6.7
lxml: 3.6.4
bs4: 4.3.2
html5lib: 0.99999
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.36.0
pandas_datareader: None