Merge vs. Join on a Multi-Index with a Categorical · Issue #16627 · pandas-dev/pandas (original) (raw)
Hello,
I know there's been a lot of work on Categoricals, but I ran across this on 0.20.2 and on github's master.
import pandas as pd
a = {'Cat1': pd.Categorical(['a', 'b', 'a', 'c', 'a', 'b'], ['a', 'b', 'c']), 'Bool1': [False, True, False, True, False, False]} a = pd.DataFrame(a)
b = {'Cat': pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'], ['a', 'b', 'c']), 'Bool': [False, False, False, True, True, True], 'Factor': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]}
b = pd.DataFrame(b).set_index(['Cat', 'Bool'])['Factor']
This works!
pd.merge(a, b.reset_index(), left_on=['Cat1', 'Bool1'], right_on=['Cat', 'Bool'], how='left')
This crashes with AttributeError: 'CategoricalIndex' object has no attribute 'is_dtype_equal'
a.join(b, on=['Cat1', 'Bool1'])
Problem description
When I try to join
a categorical with a Multi-Index, I get this error. As a work-around, you can merge
instead.
Expected Output
I would expect that both methods should work and give equivalent output.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.21.0.dev+136.g10c17d4
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None