DEPR: Merging on different number of levels · Issue #34862 · pandas-dev/pandas (original) (raw)

Two dataframes with MultiIndex in columns (shapes are different).

df1_columns = pd.MultiIndex.from_tuples([("A0", "B0", "C0"), ("A1", "B1", "C1")]) df1 = pd.DataFrame([[1, 2], [10, 20]], columns=df1_columns)

df2_columns = pd.MultiIndex.from_tuples([("X0", "Y0"), ("X1", "Y1")]) df2 = pd.DataFrame([[1, 200], [10, 200]], columns=df2_columns)

Case 1

pd.merge(df1, df2, left_on=[("A0", "B0","C0")], right_on=[("X0", "Y0")])

returns

   (A0, B0, C0)  (A1, B1, C1)  (X0, Y0)  (X1, Y1)
0             1             2         1       200
1            10            20        10       200

It's ok. Columns saved its structure from the both dataframes. They are tuples, but it can be fixed (for example with something like this).

Case 2

pd.merge(df2, df1, left_on=[("X0", "Y0")], right_on=[("A0", "B0","C0")])

returns

   X0   X1  A0  A1
   Y0   Y1  B0  B1
0   1  200   1   2
1  10  200  10  20

When the left DataFrame is a DataFrame with fewer column levels the merge result is questionable: lower levels are dropped.

Is this a bug or a feature (result levels shape must be the same as for the left DataFrame)?

Notes:

  1. In both cases the warning is the same UserWarning: merging between different levels can give an unintended result ....
  2. Checked with pandas 0.25.1 and 1.0.4.
  3. See notebook.