pyspark.sql.DataFrame.checkpoint β PySpark 3.5.5 documentation (original) (raw)
DataFrame.
checkpoint
(eager: bool = True) β pyspark.sql.dataframe.DataFrame[source]ΒΆ
Returns a checkpointed version of this DataFrame. Checkpointing can be used to truncate the logical plan of this DataFrame, which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set with SparkContext.setCheckpointDir()
.
New in version 2.1.0.
Parameters
eagerbool, optional, default True
Whether to checkpoint this DataFrame immediately.
Returns
Checkpointed DataFrame.
Notes
This API is experimental.
Examples
import tempfile df = spark.createDataFrame([ ... (14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) with tempfile.TemporaryDirectory() as d: ... spark.sparkContext.setCheckpointDir("/tmp/bb") ... df.checkpoint(False) DataFrame[age: bigint, name: string]