feat: per-segment parallel top-K scoring for LSM_SPARSE_VECTOR (Step 5 follow-up to #4068) (original) (raw)

Follow-up to #4068. Tracks the deferred Step 5 (per-segment parallel scoring) so it does not get lost.

Context

The v2 sparse-vector backend ships serial BMW DAAT (BmwScorer.topK) over a merged per-dim cursor stack. At 10M docs / 30k-dim SPLADE-shape this lands at 1.20 ms/query with size-tiered compaction in place. That is well inside the noise of network round-trip + JSON serialization for any real-world workload, which is why parallel dispatch was deferred during the v2 land.

What is already in place in the tree (no code to write for plumbing, just the hot-path dispatch):

Scope

  1. Wire PaginatedSparseVectorEngine.topK to fan out to SparseVectorScoringPool when segments.length >= threshold and pool.getMaxParallelism() > 1.
  2. RID-range partitioning that preserves the newest-source-wins merge across cross-segment dim contributions. Each partition produces its own top-K, then a final merge step.
  3. Ensure correctness: BmwScorerCorrectnessTest-shape coverage at multi-segment scale (parallel result must match serial result bit-for-bit modulo quantization noise).
  4. Benchmark against the existing LSMSparseVectorIndexLargeBenchmark 10M run; expected gain is ~3-4x on a 4-core machine, capped by the segment count.
  5. Drop the "currently the topK path is serial" wording from the two SPARSE_VECTOR_SCORING_* config descriptions and from the SparseVectorScoringPool startup warning once the dispatch is live.

Acceptance criteria

Out of scope

Also still tracked

The other deferred item from #4068 - one-shot backward-compat rebuild for #4065 MVP-era postings on first open - is not part of this issue. The v2 backend keeps the old LSM-Tree shell readable (so old schemas still open) but does not auto-rebuild MVP postings into segments; pre-v2 datasets need re-insertion. If a user reports it in the field, file a separate issue for that migration path.