Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing by ir1ka · Pull Request #33076 · vllm-project/vllm (original) (raw)

[gemini-code-assist[bot]](/apps/gemini-code-assist)

yewentao256

yewentao256

@yewentao256 yewentao256 added the ready

ONLY add when PR is ready to merge/full CI is needed

label

Jan 26, 2026

@ir1ka

Signed-off-by: IriKa Qiu qiujie.jq@gmail.com

@ir1ka

FlashInfer use cutlass, cudnn or trtllm for mm_fp4, and they only support sm100 or newer. Fallback to select marlin backend.

Signed-off-by: IriKa Qiu qiujie.jq@gmail.com

yewentao256

@ir1ka ir1ka deleted the turing-merlin branch

January 28, 2026 04:53

apd10 pushed a commit to apd10/vllm that referenced this pull request

Jan 31, 2026

@ir1ka

…nvfp4 weights on Turing (vllm-project#33076)

Signed-off-by: IriKa Qiu qiujie.jq@gmail.com

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request

May 10, 2026

@ir1ka

…nvfp4 weights on Turing (vllm-project#33076)

Signed-off-by: IriKa Qiu qiujie.jq@gmail.com

0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request

May 19, 2026

@ir1ka

…nvfp4 weights on Turing (vllm-project#33076)

Signed-off-by: IriKa Qiu qiujie.jq@gmail.com

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})