feat: groupBy/groupSize options on vector.neighbors for diversified retrieval (original) (raw)

Feature Request: groupBy Option on Vector Neighbors

Originated from discussion #4044 (Qdrant to ArcadeDB migration).

Overview

Add groupBy and groupSize options to vector.neighbors (and the future vector.sparseNeighbors) so vector retrieval can be diversified at search time by a payload field, instead of forcing callers to over-fetch and post-partition with ROW_NUMBER() OVER (PARTITION BY ...).

This mirrors Qdrant's query_points_groups and is generally useful for any diversification scenario:

Design

API

SELECT * FROM vector.neighbors( 'Doc[embedding]', $queryVec, 10, { groupBy: 'source_file', groupSize: 1, filter: "(status = 'active')" } )

Options:

When groupBy is absent, behavior is identical to today (flat top-K, no breaking change).

Implementation

Best-per-group is enforced during HNSW traversal, not as a post-filter:

Composition

Acceptance criteria

Out of scope (future work)

cc @astarso