BUG: pandas merge suffixes accepting set can interchange right and left suffix order · Issue #33740 · pandas-dev/pandas (original) (raw)


df_b = pd.DataFrame(dict(p=[1,2,3], q=[2,3,4])) df_a = pd.DataFrame(dict(p=[2,3,4], q=[1,2,3])) pd.merge(left=df_b, right=df_a, on=['p'], how='outer', suffixes={'_b', '_a'})

Current output:

| | p | q_a | q_b | | | ---- | ---- | ---- | --- | | 0 | 1 | 2.0 | NaN | | 1 | 2 | 3.0 | 1.0 | | 2 | 3 | 4.0 | 2.0 | | 3 | 4 | NaN | 3.0 |

Expected Output:

| | p | q_b | q_a | | | ---- | ---- | ---- | --- | | 0 | 1 | 2.0 | NaN | | 1 | 2 | 3.0 | 1.0 | | 2 | 3 | 4.0 | 2.0 | | 3 | 4 | NaN | 3.0 |

Problem description

This line is causing the issue:

lsuf, rsuf = self.suffixes

When you unpack set it returns elements in sorted order:

l, r = (2, 1); print(l, r) # l=2, r=1
l, r = [2, 1]; print(l, r) # l=2, r=1
l, r = {2, 1}; print(l, r) # l=1, r=2

Output of pd.show_versions()

[INSTALLED VERSIONS ------------------ commit : None python : 3.8.2.final.0 python-bits : 64 OS : Linux OS-release : 4.14.165-133.209.amzn2.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.18.3
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None]