Merge_asof() Requires specific int type, not reflected in error or documentation · Issue #28870 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd

left = pd.DataFrame({'ts_int': [0, 100, 200], 'left_val': [1, 2, 3]}) right = pd.DataFrame({'ts_int': [50, 150, 250], 'right_val': [1, 2, 3]}) left['ts_int'] = left['ts_int'].astype(int) right['ts_int'] = right['ts_int'].astype(int) pd.merge_asof(left, right, on='ts_int', tolerance=100)

pandas.errors.MergeError: key must be integer, timestamp or float

print(left['ts_int'].dtype)

int32 print(right['ts_int'].dtype) int32

Problem description

merge_asof() throws the MergeError key must be integer, timestamp or float even when the data used to merge on is a valid integer type.

This happens with all types of int with the exception of int64, as a result of the check performed when a tolerance is passed (pandas/core/reshape/merge.py:1641). This check uses the is_int64_dtype(lt), which will ofcourse return False for every type of int that isn't int64. This requirement of the on key being int64 is however not documented in the docs, nor is it reflected from the raised error.

Expected Output

The expected output can go either of two ways. If it should be possible to perform the merge on every integer type, the expected output would be:

   ts_int  left_val  right_val
0       0         1        NaN
1     100         2        1.0
2     200         3        2.0

If it shouldn't be possible, I'd expect the returned MergeError and/or the docs to reflect that the integer type that should be used as key to merge on has to be of int64 type.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : None.None

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None