[KVConnector][Core] Support cross-layer KV blocks by orozery · Pull Request #27743 · vllm-project/vllm (original) (raw)

@orozery requested review fromApostaC, LucasWilkinson, NickLucche, WoosukKwon, alexm-redhat, comaniac, mgoin, njhill, pavanimajety, youkaichao and zhuohan123 as code owners

October 29, 2025 12:27

[gemini-code-assist[bot]](/apps/gemini-code-assist)

njhill

@orozery orozery changed the titleGPUModelRunner: Support contiguous KV data across layers GPUModelRunner: Support cross-layer KV blocks

Oct 29, 2025

ApostaC

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request

Nov 29, 2025

@orozery @devpatelio

Signed-off-by: Or Ozeri oro@il.ibm.com

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request

Dec 1, 2025

@orozery @kitaekatt

Signed-off-by: Or Ozeri oro@il.ibm.com

This was referenced

Dec 2, 2025

nv-yna pushed a commit to ai-dynamo/dynamo that referenced this pull request

Dec 23, 2025

…che support

Enables KVBM to correctly detect and configure FullyContiguous layout when vLLM uses cross-layer KV cache blocks (vllm-project/vllm#27743).

Changes:

Previously, KVBM always auto-detected LayerSeparate even when vLLM provided FullyContiguous tensors, causing incorrect memory access patterns during block transfers. This fix ensures proper layout configuration for accurate performance benchmarking of the 4x transfer speedup improvement.

Related: vllm-project/vllm#27742, vllm-project/vllm#27743

wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request

Dec 27, 2025

@whx-sjtu

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.


Signed-off-by: whx-sjtu 2952154980@qq.com

This was referenced

Jan 30, 2026

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request

Feb 28, 2026

@whx-sjtu

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.


Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: zrj026 zhangrunjiang026@gmail.com

maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request

Mar 2, 2026

@whx-sjtu @maoxx241

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.


Signed-off-by: whx-sjtu 2952154980@qq.com

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request

Mar 4, 2026

@whx-sjtu

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.


Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: zrj026 zhangrunjiang026@gmail.com

yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request

May 6, 2026

@whx-sjtu

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.


Signed-off-by: whx-sjtu 2952154980@qq.com

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request

May 10, 2026

@orozery

Signed-off-by: Or Ozeri oro@il.ibm.com

nanxingMy pushed a commit to nanxingMy/vllm-ascend that referenced this pull request

May 15, 2026

@whx-sjtu

)

What this PR does / why we need it?

Last month the interface of OffloadingSpec has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.

Does this PR introduce any user-facing change?

None

How was this patch tested?

CI passed with new added test.


Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: nanxing 1014662416@qq.com

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request

May 15, 2026

@orozery

Signed-off-by: Or Ozeri oro@il.ibm.com

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request

May 15, 2026

@orozery

Signed-off-by: Or Ozeri oro@il.ibm.com

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})