BUG: pd.merge throws TypeError with categoricals · Issue #16900 · pandas-dev/pandas (original) (raw)
Code Sample
from datetime import date import pandas as pd
df = pd.DataFrame( [[date(2001, 1, 1), 1.1], [date(2001, 1, 2), 1.3]], columns=['date', 'num2'] ) df['date'] = df['date'].astype('category')
df2 = pd.DataFrame( [[date(2001, 1, 1), 1.3], [date(2001, 1, 3), 1.4]], columns=['date', 'num4'] ) df2['date'] = df2['date'].astype('category')
result = pd.merge( df, df2, how='outer', on=['date'] )
print(result)
Problem description
If you run the example above, you will get the following output:
Traceback (most recent call last):
File "blah.py", line 20, in <module>
df, df2, how='outer', on=['date']
File "/Users/dave/code/pandas/pandas/core/reshape/merge.py", line 57, in merge
return op.get_result()
File "/Users/dave/code/pandas/pandas/core/reshape/merge.py", line 604, in get_result
self._maybe_add_join_keys(result, left_indexer, right_indexer)
File "/Users/dave/code/pandas/pandas/core/reshape/merge.py", line 714, in _maybe_add_join_keys
key_col = Index(lvals).where(~mask, rvals)
File "/Users/dave/code/pandas/pandas/core/indexes/base.py", line 613, in where
values = np.where(cond, self.values, other)
TypeError: invalid type promotion
This occurs when all of the following are true:
- Both columns to merge on are categorical dates
- The categoricals have the same dtype, but the values are different
- The merge is 'outer'
If you change the merge to 'inner', or change the date values to be the same, then the code works as expected.
Output of pd.show_versions()
In [3]: pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.20.2
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None