Ibis (original) (raw)

An open source dataframe library that works with any data system

Ibis: the portable Python dataframe library

Ibis offers a familiar local dataframe experience with outstanding performance, using DuckDB by default.

import ibis

ibis.options.interactive = True

t = ibis.read_parquet("penguins.parquet", table_name="penguins")
t.head(3)

1

Import Ibis.

2

Enable interactive mode for exploratory data analysis (EDA) or demos.

3

Read a Parquet file and specify a table name (optional).

4

Display the first few rows of the table.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩ │ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ ├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤ │ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ │ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ │ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ └─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Iterate and explore data locally:

grouped = t.group_by("species", "island").agg(count=t.count()).order_by("count")
grouped

1

Transform the table.

2

Display the transformed table.

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩ │ string │ string │ int64 │ ├───────────┼───────────┼───────┤ │ Adelie │ Biscoe │ 44 │ │ Adelie │ Torgersen │ 52 │ │ Adelie │ Dream │ 56 │ │ Chinstrap │ Dream │ 68 │ │ Gentoo │ Biscoe │ 124 │ └───────────┴───────────┴───────┘

One API for 20+ backends

Use the same dataframe API for 20+ backends:

For example:

con = ibis.connect("duckdb://")
t = con.read_parquet("penguins.parquet")
t.head(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩ │ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ ├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤ │ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ │ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ │ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ └─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

t.group_by("species", "island").agg(count=t.count()).order_by("count")

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩ │ string │ string │ int64 │ ├───────────┼───────────┼───────┤ │ Adelie │ Biscoe │ 44 │ │ Adelie │ Torgersen │ 52 │ │ Adelie │ Dream │ 56 │ │ Chinstrap │ Dream │ 68 │ │ Gentoo │ Biscoe │ 124 │ └───────────┴───────────┴───────┘

con = ibis.connect("polars://")
t = con.read_parquet("penguins.parquet")
t.head(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩ │ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ ├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤ │ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ │ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ │ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ └─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

t.group_by("species", "island").agg(count=t.count()).order_by("count")

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩ │ string │ string │ int64 │ ├───────────┼───────────┼───────┤ │ Adelie │ Biscoe │ 44 │ │ Adelie │ Torgersen │ 52 │ │ Adelie │ Dream │ 56 │ │ Chinstrap │ Dream │ 68 │ │ Gentoo │ Biscoe │ 124 │ └───────────┴───────────┴───────┘

con = ibis.connect("datafusion://")
t = con.read_parquet("penguins.parquet")
t.head(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩ │ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ ├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤ │ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ │ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ │ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ └─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

t.group_by("species", "island").agg(count=t.count()).order_by("count")

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩ │ string │ string │ int64 │ ├───────────┼───────────┼───────┤ │ Adelie │ Biscoe │ 44 │ │ Adelie │ Torgersen │ 52 │ │ Adelie │ Dream │ 56 │ │ Chinstrap │ Dream │ 68 │ │ Gentoo │ Biscoe │ 124 │ └───────────┴───────────┴───────┘

con = ibis.connect("pyspark://")
t = con.read_parquet("penguins.parquet")
t.head(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩ │ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ ├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤ │ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ │ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ │ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ └─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

t.group_by("species", "island").agg(count=t.count()).order_by("count")

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩ │ string │ string │ int64 │ ├───────────┼───────────┼───────┤ │ Adelie │ Biscoe │ 44 │ │ Adelie │ Torgersen │ 52 │ │ Adelie │ Dream │ 56 │ │ Chinstrap │ Dream │ 68 │ │ Gentoo │ Biscoe │ 124 │ └───────────┴───────────┴───────┘

This allows you to iterate locally and deploy remotely by changing a single line of code. For instance, develop locally with DuckDB and deploy remotely to BigQuery. Or, using any combination of backends that meet your requirements.

Python + SQL: better together

Ibis works by decoupling the dataframe API from the backend execution. Most backends support a SQL dialect, which Ibis compiles its expressions into using SQLGlot. You can inspect the SQL that Ibis generates for any SQL backend:

1

Display the SQL generated from the table expression.

SELECT
  *
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" ASC

And use SQL strings directly, mixing and matching with Python dataframe code:

t.sql(
    "SELECT species, island, COUNT(*) AS count FROM penguins GROUP BY species, island"
).order_by("count")

1

Transform the table using SQL.

2

Then, transform the table using Python dataframe code.

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ island ┃ count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩ │ string │ string │ int64 │ ├───────────┼───────────┼───────┤ │ Adelie │ Biscoe │ 44 │ │ Adelie │ Torgersen │ 52 │ │ Adelie │ Dream │ 56 │ │ Chinstrap │ Dream │ 68 │ │ Gentoo │ Biscoe │ 124 │ └───────────┴───────────┴───────┘

This allows you to combine the flexibility of Python with the scale and performance of modern SQL.

Users say…

“Ibis is amazing, there is so much bikeshedding out there that this library improves upon. I love that now we can empower any visualization with nearly any dataset! Big thanks to those who have contributed!”

“I now have Ibis code that runs PySpark in my Databricks environment and Polars on my laptop which is pretty slick 🔥”

“I love that with Ibis, I can use SQL for the heavy lifting or aggregations and then switch to a dataframe-like API for the type of dynamic transformations that would otherwise be tedious to do in pure SQL.”

Get started with Ibis