Merge Using Index Name Produces Incorrect Result · Issue #24212 · pandas-dev/pandas (original) (raw)

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.23.4'

In [3]: df_left_data = [
   ...:     {
   ...:         'id': 11,
   ...:         'left_data': 'left_11'
   ...:     },
   ...:     {
   ...:         'id': 22,
   ...:         'left_data': 'left_22'
   ...:     },
   ...:     {
   ...:         'id': 33,
   ...:         'left_data': 'left_33'
   ...:     }
   ...: ]
   ...: 
   ...: df_left = pd.DataFrame(df_left_data)
   ...: df_left.set_index('id', drop=False, inplace=True)
   ...: df_left
   ...: 
   ...: 
Out[3]: 
    id left_data
id              
11  11   left_11
22  22   left_22
33  33   left_33
In [4]: df_right_data = [
   ...:     {
   ...:         'id': 22,
   ...:         'right_data': 'right_22'
   ...:     },
   ...:     {
   ...:         'id': 33,
   ...:         'right_data': 'right_33'
   ...:     }
   ...: ]
   ...: 
   ...: df_right = pd.DataFrame(df_right_data)
   ...: df_right.set_index('id', drop=True, inplace=True)
   ...: df_right
   ...: 
   ...: 
Out[4]: 
   right_data
id           
22   right_22
33   right_33

In [5]: pd.merge(left=df_left, right=df_right, left_index=True, right_on='id', how='left')
Out[5]: 
    id left_data right_data
id                         
33  11   left_11        NaN
22  22   left_22   right_22
33  33   left_33   right_33

Problem description

The index of the 1st row should be 11 but it's producing a wrong value of 33 right now.

The following 2 permutations of parameters produce correct results:

Expected Output

In [6]: pd.merge(left=df_left, right=df_right, left_index=True, right_index=True, how='left')
Out[6]: 
    id left_data right_data
id                         
11  11   left_11        NaN
22  22   left_22   right_22
33  33   left_33   right_33

In [7]: pd.merge(left=df_left, right=df_right, left_on='id', right_index=True, how='left')
Out[7]: 
    id left_data right_data
id                         
11  11   left_11        NaN
22  22   left_22   right_22
33  33   left_33   right_33

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.6.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None