pyspark.sql.DataFrame.toArrow — PySpark 4.1.0 documentation (original) (raw)

DataFrame.toArrow()[source]#

Returns the contents of this DataFrame as PyArrow pyarrow.Table.

This is only available if PyArrow is installed and available.

New in version 4.0.0.

Notes

This method should only be used if the resulting PyArrow pyarrow.Table is expected to be small, as all the data is loaded into the driver’s memory.

This API is a developer API.

Examples

df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"]) df.coalesce(1).toArrow() pyarrow.Table age: int64 name: string


age: [[2,5]] name: [["Alice","Bob"]]