[KVConnector][Core] Support cross-layer KV blocks by orozery · Pull Request #27743 · vllm-project/vllm (original) (raw)

requested review fromApostaC, LucasWilkinson, NickLucche, WoosukKwon, alexm-redhat, comaniac, mgoin, njhill, pavanimajety, youkaichao and zhuohan123 as code owners

October 29, 2025 12:27

[ gemini-code-assist[bot] ](/apps/gemini-code-assist)

orozery changed the title~~GPUModelRunner: Support contiguous KV data across layers~~ GPUModelRunner: Support cross-layer KV blocks

Oct 29, 2025

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request

Nov 29, 2025

Signed-off-by: Or Ozeri oro@il.ibm.com

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request

Dec 1, 2025

Signed-off-by: Or Ozeri oro@il.ibm.com

This was referenced

Dec 2, 2025

nv-yna pushed a commit to ai-dynamo/dynamo that referenced this pull request

Dec 23, 2025

…che support

Enables KVBM to correctly detect and configure FullyContiguous layout when vLLM uses cross-layer KV cache blocks (vllm-project/vllm#27743).

Changes:

Add LayoutType::auto_detect() to detect FullyContiguous vs LayerSeparate based on tensor count and shape pattern
Update worker auto-detection to use new auto_detect() function
Export PyLayoutType enum to Python for explicit layout configuration
Add layout detection in Python wrapper to pass layout type explicitly

Previously, KVBM always auto-detected LayerSeparate even when vLLM provided FullyContiguous tensors, causing incorrect memory access patterns during block transfers. This fix ensures proper layout configuration for accurate performance benchmarking of the 4x transfer speedup improvement.

wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request

Dec 27, 2025

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: whx-sjtu 2952154980@qq.com

This was referenced

Jan 30, 2026

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request

Feb 28, 2026

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: zrj026 zhangrunjiang026@gmail.com

maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request

Mar 2, 2026

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: whx-sjtu 2952154980@qq.com

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request

Mar 4, 2026

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: zrj026 zhangrunjiang026@gmail.com

yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request

May 6, 2026

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: whx-sjtu 2952154980@qq.com

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request

May 10, 2026

Signed-off-by: Or Ozeri oro@il.ibm.com

nanxingMy pushed a commit to nanxingMy/vllm-ascend that referenced this pull request

May 15, 2026

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: nanxing 1014662416@qq.com

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request

May 15, 2026

Signed-off-by: Or Ozeri oro@il.ibm.com

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request

May 15, 2026

Signed-off-by: Or Ozeri oro@il.ibm.com

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})