pandas.DataFrame.compare — pandas 2.2.3 documentation (original) (raw)
DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False, result_names=('self', 'other'))[source]#
Compare to another DataFrame and show the differences.
Parameters:
otherDataFrame
Object to compare with.
align_axis{0 or ‘index’, 1 or ‘columns’}, default 1
Determine which axis to align the comparison on.
- 0, or ‘index’Resulting differences are stacked vertically
with rows drawn alternately from self and other. - 1, or ‘columns’Resulting differences are aligned horizontally
with columns drawn alternately from self and other.
keep_shapebool, default False
If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.
keep_equalbool, default False
If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.
result_namestuple, default (‘self’, ‘other’)
Set the dataframes names in the comparison.
Added in version 1.5.0.
Returns:
DataFrame
DataFrame that shows the differences stacked side by side.
The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.
Raises:
ValueError
When the two DataFrames don’t have identical labels or shape.
Notes
Matching NaNs will not appear as a difference.
Can only compare identically-labeled (i.e. same shape, identical row and column labels) DataFrames
Examples
df = pd.DataFrame( ... { ... "col1": ["a", "a", "b", "b", "a"], ... "col2": [1.0, 2.0, 3.0, np.nan, 5.0], ... "col3": [1.0, 2.0, 3.0, 4.0, 5.0] ... }, ... columns=["col1", "col2", "col3"], ... ) df col1 col2 col3 0 a 1.0 1.0 1 a 2.0 2.0 2 b 3.0 3.0 3 b NaN 4.0 4 a 5.0 5.0
df2 = df.copy() df2.loc[0, 'col1'] = 'c' df2.loc[2, 'col3'] = 4.0 df2 col1 col2 col3 0 c 1.0 1.0 1 a 2.0 2.0 2 b 3.0 4.0 3 b NaN 4.0 4 a 5.0 5.0
Align the differences on columns
df.compare(df2) col1 col3 self other self other 0 a c NaN NaN 2 NaN NaN 3.0 4.0
Assign result_names
df.compare(df2, result_names=("left", "right")) col1 col3 left right left right 0 a c NaN NaN 2 NaN NaN 3.0 4.0
Stack the differences on rows
df.compare(df2, align_axis=0) col1 col3 0 self a NaN other c NaN 2 self NaN 3.0 other NaN 4.0
Keep the equal values
df.compare(df2, keep_equal=True) col1 col3 self other self other 0 a c 1.0 1.0 2 b b 3.0 4.0
Keep all original rows and columns
df.compare(df2, keep_shape=True) col1 col2 col3 self other self other self other 0 a c NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN 3.0 4.0 3 NaN NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN NaN
Keep all original rows and columns and also all original values
df.compare(df2, keep_shape=True, keep_equal=True) col1 col2 col3 self other self other self other 0 a c 1.0 1.0 1.0 1.0 1 a a 2.0 2.0 2.0 2.0 2 b b 3.0 3.0 3.0 4.0 3 b b NaN NaN 4.0 4.0 4 a a 5.0 5.0 5.0 5.0