[KVConnector][Core] Support cross-layer KV blocks by orozery · Pull Request #27743 · vllm-project/vllm (original) (raw)
requested review fromApostaC, LucasWilkinson, NickLucche, WoosukKwon, alexm-redhat, comaniac, mgoin, njhill, pavanimajety, youkaichao and zhuohan123 as code owners
[](/apps/gemini-code-assist)
orozery changed the title
GPUModelRunner: Support contiguous KV data across layers GPUModelRunner: Support cross-layer KV blocks
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request
Signed-off-by: Or Ozeri oro@il.ibm.com
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request
Signed-off-by: Or Ozeri oro@il.ibm.com
This was referenced
Dec 2, 2025
nv-yna pushed a commit to ai-dynamo/dynamo that referenced this pull request
…che support
Enables KVBM to correctly detect and configure FullyContiguous layout when vLLM uses cross-layer KV cache blocks (vllm-project/vllm#27743).
Changes:
- Add LayoutType::auto_detect() to detect FullyContiguous vs LayerSeparate based on tensor count and shape pattern
- Update worker auto-detection to use new auto_detect() function
- Export PyLayoutType enum to Python for explicit layout configuration
- Add layout detection in Python wrapper to pass layout type explicitly
Previously, KVBM always auto-detected LayerSeparate even when vLLM provided FullyContiguous tensors, causing incorrect memory access patterns during block transfers. This fix ensures proper layout configuration for accurate performance benchmarking of the 4x transfer speedup improvement.
Related: vllm-project/vllm#27742, vllm-project/vllm#27743
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request
What this PR does / why we need it?
Last month the interface of OffloadingSpec has
changed(vllm-project/vllm#27743). This PR fixes
this bug and adds e2e test for cpu offloading.
Does this PR introduce any user-facing change?
None
How was this patch tested?
CI passed with new added test.
- vLLM version: release/v0.13.0
- vLLM main: vllm-project/vllm@ad32e3e
Signed-off-by: whx-sjtu 2952154980@qq.com
This was referenced
Jan 30, 2026
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request
What this PR does / why we need it?
Last month the interface of OffloadingSpec has
changed(vllm-project/vllm#27743). This PR fixes
this bug and adds e2e test for cpu offloading.
Does this PR introduce any user-facing change?
None
How was this patch tested?
CI passed with new added test.
- vLLM version: release/v0.13.0
- vLLM main: vllm-project/vllm@ad32e3e
Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: zrj026 zhangrunjiang026@gmail.com
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request
What this PR does / why we need it?
Last month the interface of OffloadingSpec has
changed(vllm-project/vllm#27743). This PR fixes
this bug and adds e2e test for cpu offloading.
Does this PR introduce any user-facing change?
None
How was this patch tested?
CI passed with new added test.
- vLLM version: release/v0.13.0
- vLLM main: vllm-project/vllm@ad32e3e
Signed-off-by: whx-sjtu 2952154980@qq.com
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request
What this PR does / why we need it?
Last month the interface of OffloadingSpec has
changed(vllm-project/vllm#27743). This PR fixes
this bug and adds e2e test for cpu offloading.
Does this PR introduce any user-facing change?
None
How was this patch tested?
CI passed with new added test.
- vLLM version: release/v0.13.0
- vLLM main: vllm-project/vllm@ad32e3e
Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: zrj026 zhangrunjiang026@gmail.com
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request
What this PR does / why we need it?
Last month the interface of OffloadingSpec has
changed(vllm-project/vllm#27743). This PR fixes
this bug and adds e2e test for cpu offloading.
Does this PR introduce any user-facing change?
None
How was this patch tested?
CI passed with new added test.
- vLLM version: release/v0.13.0
- vLLM main: vllm-project/vllm@ad32e3e
Signed-off-by: whx-sjtu 2952154980@qq.com
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request
Signed-off-by: Or Ozeri oro@il.ibm.com
nanxingMy pushed a commit to nanxingMy/vllm-ascend that referenced this pull request
What this PR does / why we need it?
Last month the interface of OffloadingSpec has
changed(vllm-project/vllm#27743). This PR fixes
this bug and adds e2e test for cpu offloading.
Does this PR introduce any user-facing change?
None
How was this patch tested?
CI passed with new added test.
- vLLM version: release/v0.13.0
- vLLM main: vllm-project/vllm@ad32e3e
Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: nanxing 1014662416@qq.com
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request
Signed-off-by: Or Ozeri oro@il.ibm.com
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request
Signed-off-by: Or Ozeri oro@il.ibm.com
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})