Merge_asof() Requires specific int type, not reflected in error or documentation · Issue #28870 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
left = pd.DataFrame({'ts_int': [0, 100, 200], 'left_val': [1, 2, 3]}) right = pd.DataFrame({'ts_int': [50, 150, 250], 'right_val': [1, 2, 3]}) left['ts_int'] = left['ts_int'].astype(int) right['ts_int'] = right['ts_int'].astype(int) pd.merge_asof(left, right, on='ts_int', tolerance=100)
pandas.errors.MergeError: key must be integer, timestamp or float
print(left['ts_int'].dtype)
int32 print(right['ts_int'].dtype) int32
Problem description
merge_asof() throws the MergeError key must be integer, timestamp or float
even when the data used to merge on is a valid integer type.
This happens with all types of int with the exception of int64, as a result of the check performed when a tolerance is passed (pandas/core/reshape/merge.py:1641). This check uses the is_int64_dtype(lt)
, which will ofcourse return False
for every type of int that isn't int64
. This requirement of the on
key being int64 is however not documented in the docs, nor is it reflected from the raised error.
Expected Output
The expected output can go either of two ways. If it should be possible to perform the merge on every integer type, the expected output would be:
ts_int left_val right_val
0 0 1 NaN
1 100 2 1.0
2 200 3 2.0
If it shouldn't be possible, I'd expect the returned MergeError and/or the docs to reflect that the integer type that should be used as key to merge on has to be of int64 type.
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : None.None
pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None