feat: partition-aware planner pruning in SQL/Cypher+ partitioning integrity guardrails (original) (raw)

Context

PartitionedBucketSelectionStrategy is shipped and works on the write path: rows route to bucket = hash(properties) % bucketCount automatically when a type's strategy is partitioned. The strategy is also persisted in the schema and surfaced via getBucketSelectionStrategy().

But every read-path query ignores the partition strategy:

So the partition strategy gives users zero query-time benefit today. Every SELECT FROM Doc WHERE tenant_id = X, every MATCH (n:Doc {tenant_id: 'X'}), every vector top-K with a tenant filter, scans all buckets. This is a real, broad missing optimization.

Scope

Part A: Partition-aware bucket pruning

Add a planner rule, on each query engine, that:

  1. Detects whether the type has a PartitionedBucketSelectionStrategy.
  2. Checks whether the query's filter contains an equality (or IN) predicate on a property that's part of the partition key.
  3. If yes, computes hash(value) % bucketCount for each constraint value and restricts the scan / index access / vector function to those bucket ids only.
  4. If no, falls back to the existing fan-out (no regression).

The rule applies broadly:

Part B: Partitioning integrity via a needsRepartition flag

Today nothing prevents a user from breaking partitioning consistency after the fact:

The cleanest design is a persistent needsRepartition flag on the type that captures "is this type's partitioning currently trustworthy?" The planner treats the flag as a hard gate: if true, the partition-pruning rule from Part A is skipped and queries fan out across all buckets - no wrong results possible, just no optimization until the user reconciles. This avoids both the "block the DDL" and "throw on mutate" footguns: the DDL is always fast, queries are always correct, and the only cost is "no pruning until rebuild."

B.1 Schema state

Add boolean needsRepartition to LocalDocumentType (and equivalents). Persisted in schema.json. Replicated via the standard schema-replication path so followers see the same value. Set by exactly three paths:

A type created fresh as partitioned (no records yet) starts at false because the partition mapping is trivially correct over zero rows.

B.2 Planner contract

The partition-pruning rule from Part A checks type.needsRepartition() before firing. If true, the rule does nothing and the query falls back to today's fan-out across all buckets. Pruning resumes automatically on the next query after the rebuild clears the flag.

B.3 DDL ergonomics: two modes

-- Default: DDL is fast and non-blocking. Flag is set to true. WARNING is logged -- and surfaced in Studio. Queries stay correct, just lose partition pruning until -- rebuild. ALTER TYPE Doc ADD BUCKET; -- WARNING: type 'Doc' uses PartitionedBucketSelectionStrategy on property 'tenant_id'. -- Adding a bucket has invalidated the partition mapping; run -- REBUILD TYPE Doc WITH repartition = true when convenient to restore pruning. -- Until then queries fan out across all buckets.

-- Atomic: DDL + rebuild as one operation. Blocks until rebuild completes; flag -- never goes to true because the rebuild covers the whole type before the DDL -- returns. ALTER TYPE Doc ADD BUCKET WITH repartition = true;

B.4 Rebuild command

Extend the existing REBUILD TYPE rather than introduce a new top-level statement:

REBUILD TYPE <Type> [POLYMORPHIC] WITH repartition = true [, batchSize = N]

The existing rebuild already walks every record via db.scanType to apply schema-layout changes; the new setting extends the per-record handler to also recompute the target bucket via type.getBucketSelectionStrategy().getBucketIdByRecord(...) and move the record (delete from current bucket, insert into target) when they differ. batchSize and POLYMORPHIC apply unchanged. Linear in row count, never silently triggered. The existing RebuildTypeStatement parser/AST/executor takes minimal changes - one new setting key and one branch in the per-record path. On full success the flag is cleared; on failure the flag stays true so a retry remains correct.

B.5 Query-time WARNING with throttle

When a query is executed against a type whose needsRepartition is true, the engine emits a Level.WARNING log so operators see the cost of the pending rebuild. Throttle to at most one message per minute per type (60-second window matching the existing QueryEngineManager and SparseVectorScoringPool saturation throttles - same pattern, same operator-mental-model). Implementation: AtomicLong lastNeedsRepartitionWarnMs per LocalDocumentType, the planner's pre-firing check on the partition-pruning rule increments it via compareAndSet when the window has elapsed. Spam is bounded; one entry per type per minute is enough to make the pending rebuild noticeable in any reasonable monitoring setup without drowning the log on a hot type.

Sample log line:

WARNING - type 'Doc' has needsRepartition=true; partition-aware bucket pruning is
disabled until `REBUILD TYPE Doc WITH repartition = true` runs. Queries continue
to return correct results but fan out across all 16 buckets.

B.6 Visibility

B.7 Edge cases the flag handles

The principle stays: "if you opted into partitioning, the engine treats it as a load-bearing invariant; you cannot accidentally produce wrong results by mutating the schema." The flag is the mechanism that enforces it without throwing.

Part C: Documentation and Studio

  1. New docs page: "Schema design 101 - choosing a bucket strategy" with a 3-question decision tree, concrete CREATE TYPE Doc BUCKETS 16 examples, anti-patterns (low-cardinality / skewed / mutable partition keys), and a "how to verify it's working" section showing EXPLAIN-style query plans with bucket pruning visible.
  2. Add a paragraph to the type-creation reference and the vector-index reference linking to the design page.
  3. Promote the partition-aware-vector-filter pattern as the answer to the filterable-HNSW question, alongside the future ACORN integration follow-up.
  4. Studio: when creating a type, surface a hint near the bucket-strategy dropdown ("If your data is scoped by tenant, customer, region, or another high-cardinality identifier, partition by that property for query-time pruning") linking to the design page. Half-day of frontend work; high leverage because schema choices are made once.

Acceptance criteria

Out of scope

Why now

Comes out of the post-#4068 roadmap discussion: with sparse-vector scaling shipped, the next-most-visible production gap is filter-aware vector retrieval, and the cheapest credible answer for the multi-tenant SaaS case is partition-aware planner pruning - which the engine already has the data for, just never used. The same rule then benefits every non-vector query that filters on a partition key, so the work is broadly reusable rather than vector-specific.