pyspark.sql.DataFrame.subtract — PySpark 4.1.0 documentation (original) (raw)
DataFrame.subtract(other)[source]#
Return a new DataFrame containing rows in this DataFramebut not in another DataFrame.
New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
Parameters
otherDataFrame
Another DataFrame that needs to be subtracted.
Returns
Subtracted DataFrame.
Notes
This is equivalent to EXCEPT DISTINCT in SQL.
Examples
Example 1: Subtracting two DataFrames with the same schema
df1 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3), ("c", 4)], ["C1", "C2"]) df2 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3)], ["C1", "C2"]) result_df = df1.subtract(df2) result_df.show() +---+---+ | C1| C2| +---+---+ | c| 4| +---+---+
Example 2: Subtracting two DataFrames with different schemas
df1 = spark.createDataFrame([(1, "A"), (2, "B")], ["id", "value"]) df2 = spark.createDataFrame([(2, "B"), (3, "C")], ["id", "value"]) result_df = df1.subtract(df2) result_df.show() +---+-----+ | id|value| +---+-----+ | 1| A| +---+-----+
Example 3: Subtracting two DataFrames with mismatched columns
df1 = spark.createDataFrame([(1, 2)], ["A", "B"]) df2 = spark.createDataFrame([(1, 2)], ["C", "D"]) result_df = df1.subtract(df2) result_df.show() +---+---+ | A| B| +---+---+ +---+---+