pyspark.sql.DataFrame.corr — PySpark 3.5.5 documentation (original) (raw)
Calculates the correlation of two columns of a DataFrame as a double value. Currently only supports the Pearson Correlation Coefficient.DataFrame.corr() and DataFrameStatFunctions.corr() are aliases of each other.
New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
df = spark.createDataFrame([(1, 12), (10, 1), (19, 8)], ["c1", "c2"]) df.corr("c1", "c2") -0.3592106040535498 df = spark.createDataFrame([(11, 12), (10, 11), (9, 10)], ["small", "bigger"]) df.corr("small", "bigger") 1.0