BUG: Merge error when merge col is int64 and Int64 · Issue #46178 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
First DF
df1 = pd.DataFrame({'col1': [1,1,2,2,3,pd.NA]}) df1['col1'] = df1['col1'].astype('Int64')
Second DF
df2 = pd.DataFrame({'col1': [1,2,3], 'col2': list('abc')}).set_index('col1')
print(df1.dtypes) print(df2.dtypes) print(df1.merge(df2, left_on='col1', right_index=True))
Issue Description
Merging two DataFrames on a column (col1
) that is Int64
in the first and int64
in the second DF, causes the following confusing error message:
Traceback (most recent call last): File "/Users/anto/xxx/src/bugs/pandas_join.py", line 30, in print(df1.merge(df2, on='col1', how='left')) File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/frame.py", line 9339, in merge return merge( File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 122, in merge return op.get_result() File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 716, in get_result join_index, left_indexer, right_indexer = self._get_join_info() File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 967, in _get_join_info (left_indexer, right_indexer) = self._get_join_indexers() File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 941, in _get_join_indexers return get_join_indexers( File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 1484, in get_join_indexers zipped = zip(*mapped) File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 1481, in _factorize_keys(left_keys[n], right_keys[n], sort=sort, how=how) File "/Users/anto/miniconda3/conda-envs/py310/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 2164, in _factorize_keys lk = ensure_int64(np.asarray(lk)) File "pandas/_libs/algos_common_helper.pxi", line 81, in pandas._libs.algos.ensure_int64 TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NAType'
Expected Behavior
If the types of the "merge columns" are compatible, the merge should complete with no errors. Or, alternatively, the error message should warn the user of a type mismatch and of the need of casting one of the columns.
For example, casting the col1
in df2
from int64
to Int64
will result in the correct merge:
import pandas as pd
First DF
df1 = pd.DataFrame({'col1': [1,1,2,2,3,pd.NA]}) df1['col1'] = df1['col1'].astype('Int64')
Second DF
df2 = pd.DataFrame({'col1': [1,2,3], 'col2': list('abc')}).set_index('col1')
These two lines are required for the merge to work
df2 = df2.reset_index() df2['col1'] = df2['col1'].astype('Int64')
print(df1.dtypes) print(df2.dtypes) print(df1.merge(df2, on='col1', how='left'))
However, this workaround looks a bit too convoluted for me.
Installed Versions
1.4.1