Merge Using Index Name Produces Incorrect Result · Issue #24212 · pandas-dev/pandas (original) (raw)
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.23.4'
In [3]: df_left_data = [
...: {
...: 'id': 11,
...: 'left_data': 'left_11'
...: },
...: {
...: 'id': 22,
...: 'left_data': 'left_22'
...: },
...: {
...: 'id': 33,
...: 'left_data': 'left_33'
...: }
...: ]
...:
...: df_left = pd.DataFrame(df_left_data)
...: df_left.set_index('id', drop=False, inplace=True)
...: df_left
...:
...:
Out[3]:
id left_data
id
11 11 left_11
22 22 left_22
33 33 left_33
In [4]: df_right_data = [
...: {
...: 'id': 22,
...: 'right_data': 'right_22'
...: },
...: {
...: 'id': 33,
...: 'right_data': 'right_33'
...: }
...: ]
...:
...: df_right = pd.DataFrame(df_right_data)
...: df_right.set_index('id', drop=True, inplace=True)
...: df_right
...:
...:
Out[4]:
right_data
id
22 right_22
33 right_33
In [5]: pd.merge(left=df_left, right=df_right, left_index=True, right_on='id', how='left')
Out[5]:
id left_data right_data
id
33 11 left_11 NaN
22 22 left_22 right_22
33 33 left_33 right_33
Problem description
The index of the 1st row should be 11 but it's producing a wrong value of 33 right now.
The following 2 permutations of parameters produce correct results:
Expected Output
In [6]: pd.merge(left=df_left, right=df_right, left_index=True, right_index=True, how='left')
Out[6]:
id left_data right_data
id
11 11 left_11 NaN
22 22 left_22 right_22
33 33 left_33 right_33
In [7]: pd.merge(left=df_left, right=df_right, left_on='id', right_index=True, how='left')
Out[7]:
id left_data right_data
id
11 11 left_11 NaN
22 22 left_22 right_22
33 33 left_33 right_33
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.6.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None