feat: per-segment parallel top-K scoring for LSM_SPARSE_VECTOR (Step 5 follow-up to #4068) (original) (raw)
Follow-up to #4068. Tracks the deferred Step 5 (per-segment parallel scoring) so it does not get lost.
Context
The v2 sparse-vector backend ships serial BMW DAAT (BmwScorer.topK) over a merged per-dim cursor stack. At 10M docs / 30k-dim SPLADE-shape this lands at 1.20 ms/query with size-tiered compaction in place. That is well inside the noise of network round-trip + JSON serialization for any real-world workload, which is why parallel dispatch was deferred during the v2 land.
What is already in place in the tree (no code to write for plumbing, just the hot-path dispatch):
SparseVectorScoringPool- JVM-wide dedicated executor, daemon threads, bounded queue, caller-runs fallback with throttled saturationWARNING.arcadedb.sparseVectorScoringPoolThreads/arcadedb.sparseVectorScoringQueueSizeglobal config knobs (currently emit a startup warning that the topK path is still serial).PoolMetricsMeterBinder with Micrometer gauges for size, active, queue depth, queue capacity, completed tasks, caller-run fallbacks - taggedpool=sparse_vector.- Studio "Executor Pools" card under the Server tab renders the live numbers.
Scope
- Wire
PaginatedSparseVectorEngine.topKto fan out toSparseVectorScoringPoolwhensegments.length >= thresholdandpool.getMaxParallelism() > 1. - RID-range partitioning that preserves the newest-source-wins merge across cross-segment dim contributions. Each partition produces its own top-K, then a final merge step.
- Ensure correctness:
BmwScorerCorrectnessTest-shape coverage at multi-segment scale (parallel result must match serial result bit-for-bit modulo quantization noise). - Benchmark against the existing
LSMSparseVectorIndexLargeBenchmark10M run; expected gain is ~3-4x on a 4-core machine, capped by the segment count. - Drop the "currently the topK path is serial" wording from the two
SPARSE_VECTOR_SCORING_*config descriptions and from theSparseVectorScoringPoolstartup warning once the dispatch is live.
Acceptance criteria
BmwScorer.topKcallable through a parallel dispatcher inPaginatedSparseVectorEngine.topK, gated by a configurable threshold so single-segment queries stay on the caller thread.- RID-range partition correctness proven by a multi-segment correctness test.
LSMSparseVectorIndexLargeBenchmarkshows measurable speedup at 10M (target: under 0.5 ms/query at K=10 on a 4-core box).- Studio "Executor Pools" card surfaces non-zero
active/completedTasks/ occasional caller-run fallbacks under load. - Startup warning + "currently serial" config wording removed.
Out of scope
- Cross-query parallelism (already linear via independent queries on the engine's thread-safe state).
- Cross-dim parallelism within a single segment (BMW pruning is inherently serial across dims; the parallel axis is segments, not dims).
Also still tracked
The other deferred item from #4068 - one-shot backward-compat rebuild for #4065 MVP-era postings on first open - is not part of this issue. The v2 backend keeps the old LSM-Tree shell readable (so old schemas still open) but does not auto-rebuild MVP postings into segments; pre-v2 datasets need re-insertion. If a user reports it in the field, file a separate issue for that migration path.