GitHub - KellerKev/duckicelake: Iceberg v3 REST Catalog proxy on top of DuckLake (original) (raw)

duckicelake — ducks around an iceberg in a lake

An Iceberg REST Catalog proxy on top of DuckLake, with MinIO-backed object storage and real STS credential vending. Materialises DuckLake's snapshot / schema / stats state into Iceberg-spec manifests on demand, so standard Iceberg clients (PyIceberg, DuckDB's icebergextension, Trino, Spark) read rows directly from S3 — and write back via register-in-place commits that DuckLake atomically records.

⭐ Hybrid write model — write via either path, read from anywhere

The defining feature: clients can write via the Iceberg REST path_or_ via DuckLake-direct, and Iceberg readers see both within ~1s. No "sync job", no manual rewrite step, no second-class write path.

Requirements

**PostgreSQL is required as the DuckLake metastore.**The proxy talks to DuckLake's catalog tables through a psycopg pool (this is true regardless of the eager hook). The hybrid write model additionally relies on PostgreSQL-specific machinery —LISTEN / NOTIFY, an AFTER INSERT trigger onducklake_snapshot, and pg_try_advisory_lock for single-listener election. DuckLake itself supports other backends (SQLite, MySQL, DuckDB), but those would forgo the eager path: writes are still visible through the lazy LoadTable materialisation, just without the ~1s warm-S3 guarantee. Switching backends is not currently wired through the proxy's config.

Demos

Three short real-terminal recordings (vhs / ttyd, no animation, no mocks):

🎯 Architectural proof — Iceberg view on top of DuckLake (with code on screen)

Every step shows its source first, then runs it. Same Parquet on S3, same Postgres rows; two extension paths (Iceberg REST via PyIceberg + DuckDB iceberg-ext, and DuckLake direct via DuckDB ducklake-ext) see exactly the same data. A row written via DuckLake direct appears in both Iceberg readers automatically. Ends with the snapshot-id identity check: DuckLake HEAD == Iceberg current-snapshot-id == ducklake.snapshot-id property.

duckicelake architectural proof demo

📥 Full quality:duckicelake-demo_with_code.mp4 (2:52 · 1700×1080 · 3.5 MB)

Companion: lakesh

lakesh is a small DuckDB-powered SQL shell for Iceberg REST catalogs (and DuckLake direct). Profile-based connection management, an interactive REPL with psql-style meta-commands, one-shot exec mode for scripts, and an MCP server so LLM agents can query your catalogs through the same plumbing. It pairs naturally with duckicelake — just point it at the proxy. Source + docs atgithub.com/KellerKev/lakesh.

lakesh companion demo

📥 Full quality:lakesh-companion-demo.mp4 (0:55 · 1700×1000 · 1.3 MB)


  Iceberg REST client (PyIceberg, DuckDB iceberg ext, Trino, Spark, …)
              │  HTTP (Iceberg OpenAPI v3)
              ▼
       FastAPI proxy (duckicelake.server)  ──▶ Prometheus /metrics
       │     │     │                       ──▶ /healthz /readyz
       │     │     │                       ──▶ JSON logs
       │     │     │
       │     │     │  STS AssumeRole (per-table session policy)
       │     │     ▼
       │     │   MinIO STS  ──▶ vended creds (s3.access-key-id, …)
       │     │
       │     │  SQL via DuckDB+ducklake (write conn + read pool)
       │     ▼
       │  Postgres (psycopg pool)
       │     ├── ducklake_*       — schemas, tables, columns, snapshots, stats, deletes
       │     └── duckicelake_*    — properties, tags, branches, partition-spec sidecar,
       │                            nan_value_count cache, format-version override
       │
       │  S3 / MinIO direct (object I/O)
       ▼
   data/<ns>/<tbl>/                       ── Parquet data files (DuckLake)
   data/<ns>/<tbl>/                       ── Parquet position-delete files (v2)
   data/<ns>/<tbl>/                       ── eq-delete & v3 Puffin DV (.puffin)
   data/<ns>/<tbl>/metadata/
        ├── vN.metadata.json              ── TableMetadata, versioned per commit
        ├── version-hint.text             ── Hive-style pointer to vN
        ├── snap-<id>-<uuid>.avro         ── manifest list (one per snapshot)
        ├── <id>-<uuid>-m0-data.avro      ── data manifest (stats + row_id + key_metadata)
        └── <id>-<uuid>-m1-deletes.avro   ── delete / DV manifest (when applicable)

Everything runs out of a single pixi environment — no Docker.

What's in the box

Iceberg REST surface

Snapshot chain + per-file metadata

Partition pruning end-to-end

Per-file partition values land in the manifest Avro with Iceberg-correct semantics:

Verified: PyIceberg pushdown prunes country='US' from 3 files → 2 read; ts >= 2026-04-22 UTC from 3 files → 1 read.

Iceberg v3 writes (via the shim)

PyIceberg 0.11.1 still raises Cannot write manifest list for table version: 3. The fix PR upstream (iceberg-python#3070) stalled in March 2026.

pyiceberg_v3.install() vendors that PR's essentials as a monkey-patch: ManifestWriterV3 /ManifestListWriterV3 subclasses, write_manifest /write_manifest_list factory dispatch, SUPPORTED_TABLE_FORMAT_VERSIONbumped to 3 (in both pyiceberg.table.metadata andpyiceberg.table.update), DataFile.from_args rewired so default arg resolves dynamically (else V2-shape records flow into V3 writers and IndexError), client-side gates patched inTransaction.upgrade_table_version + _apply_table_update forUpgradeFormatVersionUpdate (seeds next_row_id=0) +AddSnapshotUpdate (synthesises first_row_id).

Call once before any RestCatalog operation:

from duckicelake.pyiceberg_v3 import install install()

The same shim also adds the v3 primitive types (variant, geometry,geography) that PyIceberg's pydantic validator otherwise rejects.

The proxy itself accepts upgrade-format-version to 3 and re-materialises manifests + manifest-list in V3 Avro shape (withfirst_row_id field) when the table's format-version is 3.

v3 Puffin deletion vectors

For format-version 3 tables, the proxy rewrites position-delete Parquets into a single Puffin file per snapshot containing onedeletion-vector-v1 blob per affected data file:

V2 tables keep the legacy Parquet position-delete shape — readers that only understand v2 still work.

v3 type wiring

Iceberg DuckDB
variant VARIANT
geometry / geography GEOMETRY (via the spatial extension)
timestamp_ns / timestamptz_ns TIMESTAMP_NS
decimal(p, s), uuid, date, time, boolean, all numerics direct

PyIceberg and the demo show v3 types loading + reading round-trip; DuckDB's iceberg ext currently surfaces variant / geometry asUNKNOWN (upstream gap in duckdb-iceberg 1.5.x).

OAuth2 + RBAC

STS credential vending

X-Iceberg-Access-Delegation: vended-credentials triggers real MinIOAssumeRole with a session policy scoped to the table's data-file keys + its metadata/* prefix. Returnss3.access-key-id / s3.secret-access-key / s3.session-token /s3.credentials-expiration in the LoadTable config map.

Throughput / scale

Measured: ~349 req/s on cache-hit LoadTable at concurrency 32 on one machine.

Observability

Admin

Tests + CI

Architectural decisions

See ARCHITECTURE.md for full rationale — short version:

Quickstart

pixi install pixi run backends-up # Postgres + MinIO pixi run ducklake-init # creates bucket + default namespace pixi run serve # Iceberg REST catalog on :8181 (single worker, --reload)

OR

pixi run serve-hi # 4 uvicorn workers, no reload — closer to prod shape

In another terminal:

pixi run smoke # catalog-only smoke pixi run duckdb-client # the full demo (20+ assertion blocks) pixi run test # pytest integration suite

Teardown: pixi run backends-down.

Endpoint summary

Method Path Notes
GET /v1/config catalog prefix + endpoint allowlist
GET /healthz, /readyz, /metrics ops endpoints (auth-exempt)
POST /v1/oauth/tokens OAuth2 client-credentials token endpoint
GET / POST / DELETE /v1/{prefix}/namespaces[/{ns}] schema CRUD
GET / HEAD /v1/{prefix}/namespaces/{ns} exists / load
GET /v1/{prefix}/namespaces/{ns}/tables list
POST /v1/{prefix}/namespaces/{ns}/tables create with Iceberg schema
GET / HEAD /v1/{prefix}/namespaces/{ns}/tables/{tbl} LoadTable; ?snapshot-id=N for time-travel
DELETE /v1/{prefix}/namespaces/{ns}/tables/{tbl} DROP TABLE; ?purgeRequested=true to clean S3
POST /v1/{prefix}/namespaces/{ns}/tables/{tbl} commit (full action set above)
POST /v1/{prefix}/tables/rename same-namespace rename
GET / POST / DELETE /v1/{prefix}/namespaces/{ns}/views[/{v}] view CRUD (SQL stored in DuckLake)
POST /v1/{prefix}/admin/namespaces/{ns}/tables/{tbl}/compact DuckLake compaction + file cleanup

Layout

duckicelake/
├── pixi.toml                     # one-env stack (Postgres, MinIO, Python, deps)
├── pyproject.toml
├── README.md / ARCHITECTURE.md / OPERATIONS.md / MISSING.md
├── .github/workflows/ci.yml      # CI: backends-up + pytest + demo
├── scripts/
│   ├── pg.sh                     # Postgres lifecycle
│   └── minio.sh                  # MinIO lifecycle
├── src/duckicelake/
│   ├── config.py                 # env-driven settings
│   ├── auth.py                   # OAuth2 + JWT + scope grammar
│   ├── catalog.py                # DuckLake wrapper: PG pool + DuckDB read/write split
│   │                             #   + sidecar tables + LRU metadata cache + S3 client
│   ├── types.py                  # Iceberg ↔ DuckDB ↔ DuckLake type translation
│   ├── bounds.py                 # Iceberg binary bound encoders
│   ├── iceberg.py                # TableMetadata scaffold
│   ├── manifest.py               # Iceberg v2/v3 Avro writers (data + delete + DV manifests)
│   ├── partition_sort.py         # Iceberg ↔ DuckLake partition / sort translation
│   ├── iceberg_transforms.py     # day/month/year/hour/bucket/truncate value computation
│   ├── puffin.py                 # v3 Puffin writer for deletion-vector-v1 blobs
│   ├── pyiceberg_v3.py           # client-side shim: v3 types + v3 manifest writers
│   ├── materialize.py            # full snapshot-chain materialiser (lazy + cached)
│   ├── read_manifest.py          # parses client-supplied manifest chains on commit
│   ├── sts.py                    # MinIO STS AssumeRole + session policies
│   ├── observability.py          # Prometheus metrics + JSON logging
│   ├── models.py                 # Pydantic REST request/response models
│   ├── server.py                 # FastAPI app: endpoints + middleware + handlers
│   ├── bootstrap.py              # `pixi run ducklake-init`
│   ├── smoke.py                  # catalog-only smoke
│   └── duckdb_client.py          # full demo with assertions across all features
└── tests/
    ├── conftest.py               # session-scoped uvicorn + clean-state fixtures
    ├── test_catalog_surface.py   # REST surface smoke
    ├── test_cache_and_observability.py
    └── test_puffin.py            # byte-level Puffin writer tests

Configuration

Var Default Purpose
DUCKICELAKE_PG_HOST /.pgsock Postgres host (pixi-managed local socket by default)
DUCKICELAKE_PG_PORT 55432
DUCKICELAKE_PG_USER ducklake
DUCKICELAKE_PG_DATABASE ducklake
DUCKICELAKE_CATALOG lake DuckLake catalog name (used as REST prefix)
DUCKICELAKE_S3_ENDPOINT http://127.0.0.1:9000
DUCKICELAKE_S3_REGION us-east-1
DUCKICELAKE_S3_BUCKET lakehouse
DUCKICELAKE_S3_ROOT_KEY / _ROOT_SECRET minioadmin dev defaults; production: IAM role / IRSA / Vault
DUCKICELAKE_S3_PREFIX data/
DUCKICELAKE_DEFAULT_FORMAT_VERSION 2 flip to 3 once your writer ecosystem supports it
DUCKICELAKE_CACHE_MAX 1024 LRU cap for in-process metadata cache
DUCKICELAKE_LOG_FORMAT json json for prod, text for dev
DUCKICELAKE_LOG_LEVEL INFO
DUCKICELAKE_REQUIRE_AUTH (unset) 1 → fail boot if no OAuth clients configured
DUCKICELAKE_OAUTH_CLIENTS (empty → auth disabled) id1:secret1|scope1,id2:secret2
DUCKICELAKE_OAUTH_CLIENTS_FILE (empty) JSON file alternative
DUCKICELAKE_OAUTH_JWT_SECRET required when clients are configured HMAC key
DUCKICELAKE_OAUTH_TTL_SECONDS 3600
DUCKICELAKE_OAUTH_ISSUER duckicelake

DuckDB iceberg extension: configuration notes

Three things worth knowing (all handled automatically byduckdb_client.py::_iceberg_client_con):

What's left out

See MISSING.md for the punch list. The Iceberg spec surface is effectively complete; remaining gaps are: