BUG: Merge error when merge col is int64 and Int64 · Issue #46178 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

import pandas as pd

First DF

df1 = pd.DataFrame({'col1': [1,1,2,2,3,pd.NA]}) df1['col1'] = df1['col1'].astype('Int64')

Second DF

df2 = pd.DataFrame({'col1': [1,2,3], 'col2': list('abc')}).set_index('col1')

print(df1.dtypes) print(df2.dtypes) print(df1.merge(df2, left_on='col1', right_index=True))

Issue Description

Merging two DataFrames on a column (col1) that is Int64 in the first and int64 in the second DF, causes the following confusing error message:

Traceback (most recent call last): File "/Users/anto/xxx/src/bugs/pandas_join.py", line 30, in print(df1.merge(df2, on='col1', how='left')) File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/frame.py", line 9339, in merge return merge( File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 122, in merge return op.get_result() File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 716, in get_result join_index, left_indexer, right_indexer = self._get_join_info() File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 967, in _get_join_info (left_indexer, right_indexer) = self._get_join_indexers() File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 941, in _get_join_indexers return get_join_indexers( File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 1484, in get_join_indexers zipped = zip(*mapped) File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 1481, in _factorize_keys(left_keys[n], right_keys[n], sort=sort, how=how) File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 2164, in _factorize_keys lk = ensure_int64(np.asarray(lk)) File "pandas/_libs/algos_common_helper.pxi", line 81, in pandas._libs.algos.ensure_int64 TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NAType'

Expected Behavior

If the types of the "merge columns" are compatible, the merge should complete with no errors. Or, alternatively, the error message should warn the user of a type mismatch and of the need of casting one of the columns.

For example, casting the col1 in df2 from int64 to Int64 will result in the correct merge:

import pandas as pd

First DF

df1 = pd.DataFrame({'col1': [1,1,2,2,3,pd.NA]}) df1['col1'] = df1['col1'].astype('Int64')

Second DF

df2 = pd.DataFrame({'col1': [1,2,3], 'col2': list('abc')}).set_index('col1')

These two lines are required for the merge to work

df2 = df2.reset_index() df2['col1'] = df2['col1'].astype('Int64')

print(df1.dtypes) print(df2.dtypes) print(df1.merge(df2, on='col1', how='left'))

However, this workaround looks a bit too convoluted for me.

Installed Versions

1.4.1