pandas.read_iceberg — pandas 3.0.0rc0+33.g1fd184de2a documentation (original) (raw)

pandas.read_iceberg(table_identifier, catalog_name=None, *, catalog_properties=None, row_filter=None, selected_fields=None, case_sensitive=True, snapshot_id=None, limit=None, scan_properties=None)[source]#

Read an Apache Iceberg table into a pandas DataFrame.

Added in version 3.0.0.

Warning

read_iceberg is experimental and may change without warning.

Parameters:

table_identifierstr

Table identifier.

catalog_namestr, optional

The name of the catalog.

catalog_propertiesdict of {str: str}, optional

The properties that are used next to the catalog configuration.

row_filterstr, optional

A string that describes the desired rows.

selected_fieldstuple of str, optional

A tuple of strings representing the column names to return in the output dataframe.

case_sensitivebool, default True

If True column matching is case sensitive.

snapshot_idint, optional

Snapshot ID to time travel to. By default the table will be scanned as of the current snapshot ID.

limitint, optional

An integer representing the number of rows to return in the scan result. By default all matching rows will be fetched.

scan_propertiesdict of {str: obj}, optional

Additional Table properties as a dictionary of string key value pairs to use for this scan.

Returns:

DataFrame

DataFrame based on the Iceberg table.

Examples

df = pd.read_iceberg( ... table_identifier="my_table", ... catalog_name="my_catalog", ... catalog_properties={"s3.secret-access-key": "my-secret"}, ... row_filter="trip_distance >= 10.0", ... selected_fields=("VendorID", "tpep_pickup_datetime"), ... )