Merge error on Categorical Interval columns · Issue #28668 · pandas-dev/pandas (original) (raw)

Failure on merging on Categorical columns which include intervals.
For instance, the following raises TypeError: data type not understood

bins = np.arange(0, 91, 30) df1 = pd.DataFrame(np.array([[1, 22], [2, 35], [3, 82]]), columns=['Id', 'Dist']).set_index('Id')

df1['DistGroup'] = pd.cut(df1['Dist'], bins)

idx = pd.IntervalIndex.from_breaks(bins) df2 = pd.DataFrame(np.array(['g1', 'g2', 'g3']), columns=['GroupId'], index=idx) df2.index.name = 'DistGroup'

res = pd.merge(df1, df2, left_on='DistGroup', right_index=True).reset_index()

Expected Output

Dist DistGroup GroupId
0 22 (0, 30] g1
1 35 (30, 60] g2
2 82 (60, 90] g3

'

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.5
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.3
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None