Merge vs. Join on a Multi-Index with a Categorical · Issue #16627 · pandas-dev/pandas (original) (raw)

Hello,

I know there's been a lot of work on Categoricals, but I ran across this on 0.20.2 and on github's master.

import pandas as pd

a = {'Cat1': pd.Categorical(['a', 'b', 'a', 'c', 'a', 'b'], ['a', 'b', 'c']), 'Bool1': [False, True, False, True, False, False]} a = pd.DataFrame(a)

b = {'Cat': pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'], ['a', 'b', 'c']), 'Bool': [False, False, False, True, True, True], 'Factor': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]}

b = pd.DataFrame(b).set_index(['Cat', 'Bool'])['Factor']

This works!

pd.merge(a, b.reset_index(), left_on=['Cat1', 'Bool1'], right_on=['Cat', 'Bool'], how='left')

This crashes with AttributeError: 'CategoricalIndex' object has no attribute 'is_dtype_equal'

a.join(b, on=['Cat1', 'Bool1'])

Problem description

When I try to join a categorical with a Multi-Index, I get this error. As a work-around, you can merge instead.

Expected Output

I would expect that both methods should work and give equivalent output.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0.dev+136.g10c17d4
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None