pyspark.sql.DataFrame.distinct β€” PySpark 3.5.5 documentation (original) (raw)

DataFrame. distinct() β†’ pyspark.sql.dataframe.DataFrame[source]ΒΆ

Returns a new DataFrame containing the distinct rows in this DataFrame.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns

DataFrame

DataFrame with distinct records.

Examples

df = spark.createDataFrame( ... [(14, "Tom"), (23, "Alice"), (23, "Alice")], ["age", "name"])

Return the number of distinct rows in the DataFrame

df.distinct().count() 2