ENH: Unhelpful output from assert_frame_equal when indexes differ and check_like=True (original) (raw)
Problem:
Calling testing.assert_frame_equal with mismatched indexes and check_like=True generates unhelpful output.
If you run:
import pandas as pd df1 = pd.DataFrame({"A": [1.0, 2.0, 3.0], "B": [4.0, 5.0, 6.0]}, index=["a", "b", "c"]) df2 = pd.DataFrame({"A": [1.0, 2.0, 3.0], "B": [4.0, 5.0, 6.0]}, index=["a", "b", "d"]) pd.testing.assert_frame_equal(df1, df2, check_like=True)
The output will be:
AssertionError: DataFrame.iloc[:, 0] (column name="A") are different
DataFrame.iloc[:, 0] (column name="A") values are different (33.33333 %)
[index]: [a, b, d]
[left]: [1.0, 2.0, nan]
[right]: [1.0, 2.0, 3.0]
The data of the input DataFrames are not actually different (there is no nan), but when check_like=True the code calls left.reindex_like(right) before comparing indexes (and columns), in order to ensure that both frames are ordered the same.
However, if the indexes contain different values (rather than the same values in a different order),
the reindex_like function fills the data values (row or column) for the mismatched index entries with NaNs.
This results in the subsequent index checks passing, but the assert_frame_equals function failing
with a data not equal error (as above).
Even more confusingly, if the values being compared are not floats then you get a dtype not equal error:
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="A") are different
Attribute "dtype" are different
[left]: float64
[right]: int64
These messages are quite unhelpful, as the mismatch is in the index, and the error should logically be the same as you would get if you ran with check_like=False.
Applies to:
The code above was run against the latest code from master.
print(pd.version) 1.2.0.dev0+950.gd321be6
Solution:
The message for the above assertion failure should be something like:
AssertionError: DataFrame.index are different
DataFrame.index values are different (33.33333 %)
[left]: Index(['a', 'b', 'c'], dtype='object')
[right]: Index(['a', 'b', 'd'], dtype='object')
Which is what you get if you run with check_like=False.