ENH: Allowing more control over the asserion printing format · Issue #47910 · pandas-dev/pandas (original) (raw)
Within my company we use pandas quite extensively, and with it we use the testing part for our unit tests, more specifically the assert_frame_equal
however, for the type of data frames we use the output of a failing assertion is completely unreadable.
In [1]: import pandas as pd ...: from pandas.testing import assert_frame_equal ...: from datetime import datetime ...: import numpy as np ...: df1 = pd.DataFrame(np.ones((365,3)),index=pd.date_range(datetime(2022,1,1),datetime(2022,12,31)),columns=['a','b','c']) ...: df2 = df1.copy(deep=True) ...: df2.iloc[-1,0] = 0 ...: assert_frame_equal(df1,df2) AssertionError: DataFrame.iloc[:, 0] (column name="a") are different
DataFrame.iloc[:, 0] (column name="a") values are different (0.27397 %) [index]: [2022-01-01T00:00:00.000000000, 2022-01-02T00:00:00.000000000, 2022-01-03T00:00:00.000000000, 2022-01-04T00:00:00.000000000, 2022-01-05T00:00:00.000000000, 2022-01-06T00:00:00.000000000, 2022-01-07T00:00:00.000000000, 2022-01-08T00:00:00.000000000, 2022-01-09T00:00:00.000000000, 2022-01-10T00:00:00.000000000, 2022-01-11T00:00:00.000000000, 2022-01-12T00:00:00.000000000, 2022-01-13T00:00:00.000000000, 2022-01-14T00:00:00.000000000, 2022-01-15T00:00:00.000000000, 2022-01-16T00:00:00.000000000, 2022-01-17T00:00:00.000000000, 2022-01-18T00:00:00.000000000, 2022-01-19T00:00:00.000000000, 2022-01-20T00:00:00.000000000, 2022-01-21T00:00:00.000000000, 2022-01-22T00:00:00.000000000, 2022-01-23T00:00:00.000000000, 2022-01-24T00:00:00.000000000, 2022-01-25T00:00:00.000000000, 2022-01-26T00:00:00.000000000, 2022-01-27T00:00:00.000000000, 2022-01-28T00:00:00.000000000, 2022-01-29T00:00:00.000000000, 2022-01-30T00:00:00.000000000, 2022-01-31T00:00:00.000000000, 2022-02-01T00:00:00.000000000, 2022-02-02T00:00:00.000000000, 2022-02-03T00:00:00.000000000, 2022-02-04T00:00:00.000000000, 2022-02-05T00:00:00.000000000, 2022-02-06T00:00:00.000000000, 2022-02-07T00:00:00.000000000, 2022-02-08T00:00:00.000000000, 2022-02-09T00:00:00.000000000, 2022-02-10T00:00:00.000000000, 2022-02-11T00:00:00.000000000, 2022-02-12T00:00:00.000000000, 2022-02-13T00:00:00.000000000, 2022-02-14T00:00:00.000000000, 2022-02-15T00:00:00.000000000, 2022-02-16T00:00:00.000000000, 2022-02-17T00:00:00.000000000, 2022-02-18T00:00:00.000000000, 2022-02-19T00:00:00.000000000, 2022-02-20T00:00:00.000000000, 2022-02-21T00:00:00.000000000, 2022-02-22T00:00:00.000000000, 2022-02-23T00:00:00.000000000, 2022-02-24T00:00:00.000000000, 2022-02-25T00:00:00.000000000, 2022-02-26T00:00:00.000000000, 2022-02-27T00:00:00.000000000, 2022-02-28T00:00:00.000000000, 2022-03-01T00:00:00.000000000, 2022-03-02T00:00:00.000000000, 2022-03-03T00:00:00.000000000, 2022-03-04T00:00:00.000000000, 2022-03-05T00:00:00.000000000, 2022-03-06T00:00:00.000000000, 2022-03-07T00:00:00.000000000, 2022-03-08T00:00:00.000000000, 2022-03-09T00:00:00.000000000, 2022-03-10T00:00:00.000000000, 2022-03-11T00:00:00.000000000, 2022-03-12T00:00:00.000000000, 2022-03-13T00:00:00.000000000, 2022-03-14T00:00:00.000000000, 2022-03-15T00:00:00.000000000, 2022-03-16T00:00:00.000000000, 2022-03-17T00:00:00.000000000, 2022-03-18T00:00:00.000000000, 2022-03-19T00:00:00.000000000, 2022-03-20T00:00:00.000000000, 2022-03-21T00:00:00.000000000, 2022-03-22T00:00:00.000000000, 2022-03-23T00:00:00.000000000, 2022-03-24T00:00:00.000000000, 2022-03-25T00:00:00.000000000, 2022-03-26T00:00:00.000000000, 2022-03-27T00:00:00.000000000, 2022-03-28T00:00:00.000000000, 2022-03-29T00:00:00.000000000, 2022-03-30T00:00:00.000000000, 2022-03-31T00:00:00.000000000, 2022-04-01T00:00:00.000000000, 2022-04-02T00:00:00.000000000, 2022-04-03T00:00:00.000000000, 2022-04-04T00:00:00.000000000, 2022-04-05T00:00:00.000000000, 2022-04-06T00:00:00.000000000, 2022-04-07T00:00:00.000000000, 2022-04-08T00:00:00.000000000, 2022-04-09T00:00:00.000000000, 2022-04-10T00:00:00.000000000, ...] [left]: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...] [right]: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...]
There are three things that I can think of that could make this problem more manageable in order of my preference:
assert_frame_equal(df1,df2,display_diff_only=True)
index_diff: ['2022-12-31T00:00:00.000000000] left: [1.0] right: [0.0]
I was unable to find where the printing code in pandas was located for this so I can't provide actual code examples but it would not involve much more than only passing the difference between the series/dataframes to the function that prints the assertions instead of the full objects.
2. add a parameter that prints the output in column format rather than row.
assert_frame_equal(df1,df2,display_columnar=True)
index a b 2022-01-01T00:00:00.000000000 1.0 1.0 2022-01-02T00:00:00.000000000 1.0 1.0 ... 2022-12-31T00:00:00.000000000 0.0 1.0
assert_frame_equal(df1,df2,strftime="%d-%M-%Y")
index_diff: ['2022-01-01','2022-01-02',...] left: [1.0,1.0,...] right: [1.0,1.0,...]
As said, I couldn't find where these things live in the code base so I can't really provide implementation examples but they are simple/ similar enough to existing functionalities that I'm hoping my usage examples are clear enough.
The only other solution I'm aware of something akin to this answer: https://stackoverflow.com/a/72452894 which is to write manual loops to do the comparison yourself so you can output the diff.