pyspark.sql.DataFrame.columns — PySpark 3.5.5 documentation (original) (raw)

property DataFrame. columns

Retrieves the names of all columns in the DataFrame as a list.

The order of the column names in the list reflects their order in the DataFrame.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns

list

List of column names in the DataFrame.

Examples

Example 1: Retrieve column names of a DataFrame

df = spark.createDataFrame( ... [(14, "Tom", "CA"), (23, "Alice", "NY"), (16, "Bob", "TX")], ... ["age", "name", "state"] ... ) df.columns ['age', 'name', 'state']

Example 2: Using column names to project specific columns

selected_cols = [col for col in df.columns if col != "age"] df.select(selected_cols).show() +-----+-----+ | name|state| +-----+-----+ | Tom| CA| |Alice| NY| | Bob| TX| +-----+-----+

Example 3: Checking if a specific column exists in a DataFrame

"state" in df.columns True "salary" in df.columns False

Example 4: Iterating over columns to apply a transformation

import pyspark.sql.functions as f for col_name in df.columns: ... df = df.withColumn(col_name, f.upper(f.col(col_name))) df.show() +---+-----+-----+ |age| name|state| +---+-----+-----+ | 14| TOM| CA| | 23|ALICE| NY| | 16| BOB| TX| +---+-----+-----+

Example 5: Renaming columns and checking the updated column names

df = df.withColumnRenamed("name", "first_name") df.columns ['age', 'first_name', 'state']

Example 6: Using the columns property to ensure two DataFrames have the same columns before a union

df2 = spark.createDataFrame( ... [(30, "Eve", "FL"), (40, "Sam", "WA")], ["age", "name", "location"]) df.columns == df2.columns False