ENH: Add anti-joins to pandas.merge · Issue #42916 · pandas-dev/pandas (original) (raw)

Pandas does not allow anti-joins. It would be helpful to have these added
Pyspark has implemented this on its 'join' command.
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html

Describe the solution you'd like

Add anti-left and anti-right joins to 'how'
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

API breaking implications

None

Describe alternatives you've considered

You can replicate an anti-join by doing an outer join, and filtering.

My code does work but sometimes gives too many columns - not sure why. Also my code relies on the column name not already containing '_drop' - which obviously could be an issue.

Additional context

[add any other context, code examples, or references to existing implementations about the feature request here]

import pandas as pd def left_anti_join(df1,df2,left_on,right_on): """ Perfoms left anti join of two data frames""" df=pd.merge(df1,df2,left_on=left_on,right_on=right_on,suffixes=('','_drop'), how='outer', indicator=True) df=df[df['_merge']=='left_only'] df.drop([col for col in df.columns if '_drop' in col],axis=1, inplace=True) df.drop(columns="_merge", inplace=True) return df