[Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder by Isotr0py · Pull Request #34330 · vllm-project/vllm (original) (raw)
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
[](/apps/gemini-code-assist)
Bot reviewed Feb 11, 2026
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
Isotr0py marked this pull request as ready for review
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
ONLY add when PR is ready to merge/full CI is needed
label
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
jxmorris12 added a commit to jxmorris12/vllm that referenced this pull request
- Implement zero-copy GQA for multimodal and CPU (#33732)
Signed-off-by: Taeksang Kim ts.kim@hyperaccel.ai
- [Bugfix] Support
RotaryEmbeddingCustomOp for gpt-oss (#33800)
Signed-off-by: simondanielsson simon.danielsson99@hotmail.com
- [Model] Add transcription support for Qwen3-Omni (#29828)
Signed-off-by: Muhammad Hashmi mhashmi@berkeley.edu Signed-off-by: NickLucche nlucches@redhat.com Co-authored-by: NickLucche nlucches@redhat.com
- Revert "[torch.compile] Significantly speed up cold start times" (#33820)
Signed-off-by: Richard Zou zou3519@gmail.com
- Change the type signature of MixtureOfExperts.expert_weights to MutableSequence[Sequence[Tensor]] (#33573)
Signed-off-by: Sage Moore sagmoore@redhat.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [Core] Don't schedule spec tokens with prefill chunks (#33652)
Signed-off-by: Nick Hill nickhill123@gmail.com
- feat: Add ColBERT late interaction model support (#33686)
Signed-off-by: Ilya Boytsov ilyaboytsov1805@gmail.com Signed-off-by: Ilya Boytsov boytsovpanamera@mail.ru Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: wang.yuqi yuqi.wang@daocloud.io
- [CI][torch.compile] Reduce e2e fusion test time (#33293)
Signed-off-by: Luka Govedič lgovedic@redhat.com Signed-off-by: ProExpertProg luka.govedic@gmail.com Signed-off-by: Luka Govedič ProExpertProg@users.noreply.github.com
- [Bugfix] Disable TRTLLM attention when KV transfer is enabled (#33192)
Signed-off-by: Zhanqiu Hu zh338@cornell.edu
- [Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 (#33637)
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com
- [release] Minor fixes to release annotation (#33849)
Signed-off-by: Kevin H. Luu khluu000@gmail.com
- [CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode (#32762)
Signed-off-by: Andreas Karatzas akaratza@amd.com
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841)
[Minor] Include
StreamingInputin inputs package (#33856)
Signed-off-by: Nick Hill nickhill123@gmail.com
- [docs] fix unintentional misspellings (#33863)
Signed-off-by: rinbaro ilgomishra@gmail.com
- [CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840)
Signed-off-by: Randall Smith Randall.Smith@amd.com
- [2/N] move responses/serving _make_response_output_items logic to parser (#33281)
Signed-off-by: Andrew Xia axia@fb.com Signed-off-by: Andrew Xia axia@meta.com Co-authored-by: Andrew Xia axia@fb.com
- [CI/Build] Parallelize CPU CI tests (#33778)
Signed-off-by: jiang1.li jiang1.li@intel.com
- [Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result (#33837)
Signed-off-by: Andreas Karatzas akaratza@amd.com Signed-off-by: wang.yuqi yuqi.wang@daocloud.io Co-authored-by: wang.yuqi yuqi.wang@daocloud.io
- [CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727)
Signed-off-by: Fadi Arafeh fadi.arafeh@arm.com
- [CI/Build] Fix CPU CI test case title (#33870)
Signed-off-by: jiang1.li jiang1.li@intel.com
- [Perf] Optimize the performance of structured output + reasoning (#33557)
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com
- [KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522)
Signed-off-by: Mark McLoughlin markmc@redhat.com
- [Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels. (#33858)
Signed-off-by: Pavani Majety pmajety@nvidia.com
- [Refactor] Move
taskoutside ofPoolingParams.verify(#33796)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk Signed-off-by: wang.yuqi yuqi.wang@daocloud.io Co-authored-by: wang.yuqi yuqi.wang@daocloud.io
- [ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710)
Signed-off-by: Andreas Karatzas akaratza@amd.com Signed-off-by: Matthew Wong Matthew.Wong2@amd.com Co-authored-by: Matthew Wong Matthew.Wong2@amd.com
- Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour lirans@il.ibm.com Signed-off-by: liranschour liranschour@users.noreply.github.com Co-authored-by: Or Ozeri or@ozery.com Co-authored-by: Nicolò Lucchesi nicolo.lucchesi@gmail.com Co-authored-by: Nicolò Lucchesi nlucches@redhat.com
[perf] Integrate flashinfer concat_mla_k (#31171)
[Bugfix] Fix Kimi-K2.5 NVFP4 checkpoints weight loading (#33876)
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
- [Refactor] Clean up input preprocessing (#33687)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix] Fix corner case of sparse embedding (#33886)
Signed-off-by: wang.yuqi yuqi.wang@daocloud.io
- [Docs] Add bart-plugin to docs (#33905)
Signed-off-by: NickLucche nlucches@redhat.com
- [Bugfix] Fix step3p5 parser when using mtp (#33690)
Signed-off-by: mariohong mariohong128@gmail.com
- [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale ahao@anyscale.com Signed-off-by: Aaron Hao ahao@anyscale.com Co-authored-by: SumanthRH sumanthrh99@gmail.com
- [BugFix] Fix LoRA Fp8 (#33879)
Signed-off-by: Daniel Serebrenik daserebrenik@nvidia.com
- [Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett bchislett@nvidia.com
- [Misc] Add debug logs (#33931)
Signed-off-by: NickLucche nlucches@redhat.com
- [Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path (#33795)
Signed-off-by: Yoray Zack yorayz@nvidia.com
- [Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell bnell@redhat.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [Models] Consolidate Deepseek-OCR2 processor (#33909)
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
- [Bugfix] Suppress non-TTY color output on the process name part of the log (#29714)
Signed-off-by: Tsukasa OI floss_llm@irq.a4lg.com
- Fix tokenizer test for renamed attr on Transformers v5 (#33902)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Misc] Rename
translationstospeech_to_textfor OAI serving component (#33904)
Signed-off-by: NickLucche nlucches@redhat.com
- [Bugfix] Fix DSV3.2 NVFP4 (#33932)
Signed-off-by: Matthew Bonanni mbonanni@redhat.com
- [Bugfix] Make MM batching more robust (#33817)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Minor] Sort safetensors files to ensure deterministic loading order (#33491)
Signed-off-by: Lihao Ran imlihao.ran@gmail.com Signed-off-by: mgoin mgoin64@gmail.com Co-authored-by: mgoin mgoin64@gmail.com
- Adds padding and perf improvements to wvSplitK_fp8 (#33527)
Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com
- [Bugfix] Fix DeepSeek v3.2 tokenizer outputting None issue (#33832)
Signed-off-by: wzhao18 wzhao18.sz@gmail.com
[Feature] OTEL tracing during loading (#31162)
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568)
[Docs] Add reo analytics (#33957)
Signed-off-by: simon-mo simon.mo@hey.com
- fix(ROCm): Make flash_attn import optional in MLA attention (#33511)
Signed-off-by: rabi ramishra@redhat.com
- feat(frontend): early-fail tokenization guard for user requests (#31366)
Signed-off-by: limingliang limingliang@stepfun.com Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk Co-authored-by: limingliang limingliang@stepfun.com Co-authored-by: DarkLight1337 tlleungac@connect.ust.hk
- [Misc] Update code for encoder-decoder models (#33900)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CPU] Add BF16 Kernel type for s390x (#33788)
Signed-off-by: Rehan Khan Rehan.Khan7@ibm.com
- [XPU][4/N] add mxfp4 moe model support (#33679)
Signed-off-by: Kunshang Ji kunshang.ji@intel.com
- [XPU]Replace pip in docker.xpu with uv pip (#31112)
Signed-off-by: sihao.li sihao.li@intel.com
- Onboard voyage-4-nano (#33720)
Signed-off-by: Chengcheng Pei chengchengpei@outlook.com Signed-off-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Co-authored-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation (#32263)
Signed-off-by: Gassan gassan.salama@arm.com
- Fix
mainpre-commit (#33975)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- support view_from_cpu_tensor on XPU (#33868)
Signed-off-by: Xinyu Chen xinyu1.chen@intel.com
- Consolidate and fix forbidden import
pre-commitchecks (#33982)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [PaddleOCR-VL] Add BC for transformers 5.0 config (#33976)
Signed-off-by: zhangyue66 zhangyue66@baidu.com
- Bump HF Hub client to get bug fix (#33984)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [CPU][BugFix] Fix loading of w8a8int models with bias (#33582)
Signed-off-by: Fadi Arafeh fadi.arafeh@arm.com
- [torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič lgovedic@redhat.com Signed-off-by: ProExpertProg luka.govedic@gmail.com Signed-off-by: Luka Govedič ProExpertProg@users.noreply.github.com
- [Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816)
Signed-off-by: kurt kurt@thinkingmachines.ai
- [Docs] Improve documentation (#33799)
Co-authored-by: Soren Dreano soren@numind.ai Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com
- Update
WeightTransferConfigto be more standard like the others (#33989)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Bugfix] Fix models and tests for transformers v5 (#33977)
Signed-off-by: raushan raushan@huggingface.co Signed-off-by: Raushan Turganbay raushan.turganbay@alumni.nu.edu.kz Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab (#33509)
Signed-off-by: Frederic Odermatt frederic.odermatt@44ai.ch
- [ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [Docs] Add sections on process architecture and minimum CPU resources (#33940)
It seems users can be confused about vLLM's performance when running with very small amounts of CPU cores available. We are missing a clear overview of what vLLM's process architecture is, so I added this along with some diagrams in arch_overview.md, and included a section on CPU resource recommendations in optimization.md
Signed-off-by: mgoin mgoin64@gmail.com
- [Model] Support MiniCPM-o 4.5 (#33431)
Signed-off-by: caitianchi caitianchi@modelbest.cn Signed-off-by: tc-mb caitianchi@modelbest.cn Co-authored-by: mslv mslv@baai.ac.cn
- [Refactor] Consolidate sequence normalization and enc-dec parsing (#33928)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [XPU][5/N] add wna16 xpu kernel (#33973)
Signed-off-by: Zhu, Zufang zufang.zhu@intel.com
- [Docs] Update link to Benchmark CLI documentation (#33254)
Signed-off-by: Eldar Kurtić 8884008+eldarkurtic@users.noreply.github.com
- [Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 (#33964)
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com
- [Log] Optimize duplicate startup log (#33944)
Signed-off-by: yewentao256 zhyanwentao@126.com
- [KV Connector] Add missing method overrides to MultiConnector (#33292)
Signed-off-by: Seiji Eicher seiji@anyscale.com
- [DOC] [ROCm] Update docker deployment doc (#33971)
Signed-off-by: vllmellm vllm.ellm@embeddedllm.com Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Co-authored-by: TJian tunjian.tan@embeddedllm.com Co-authored-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Model Runner V2] support apply penalty for spec decode (#33251)
Signed-off-by: zhuhaoran zhuhaoran.zhr@alibaba-inc.com
- [Refactor] Remove align block size logic in
moe_permute(#33449)
Signed-off-by: yewentao256 zhyanwentao@126.com
- [Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op (#33734)
Signed-off-by: charlifu charlifu@amd.com
- [Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) (#33993)
Signed-off-by: xuebwang-amd xuebwang@amd.com
- [Fix] Fix
logprobs=0handling for/inference/v1/generateendpoint (#34010)
Signed-off-by: SumanthRH sumanthrh99@gmail.com
- Fix RoutingMethodType logic (#33919)
Signed-off-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Signed-off-by: mgoin mgoin64@gmail.com Co-authored-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Co-authored-by: mgoin mgoin64@gmail.com
- [bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941)
Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com
- [Feat][RL] Pause and Resume with keep requests for single engine (#32351)
Signed-off-by: ahao-anyscale ahao@anyscale.com Signed-off-by: Aaron Hao ahao@anyscale.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967)
Signed-off-by: Ikenna ikennachifo@gmail.com Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com
- [Bugfix] Fix Whisper tokenization (#34011)
Signed-off-by: NickLucche nlucches@redhat.com
- [CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion (#34007)
Signed-off-by: Randall Smith Randall.Smith@amd.com
- [Bugfix] Fix _fused_moe_lora_expand signature mismatch (#33821)
Signed-off-by: Xin Yang xyangx@amazon.com
- [Misc] Add backward-compatible import aliases for renamed translations module (#34015)
Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com Co-authored-by: Cursor cursoragent@cursor.com
- [ModelRunner V2] Revert token rank comparison difference for now (#34017)
Signed-off-by: Nick Hill nickhill123@gmail.com
fix description in plugin_system.md (#33999)
[Revert] Add util
handle_deprecatedback (#33998)
Signed-off-by: yewentao256 zhyanwentao@126.com
- [Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support (#33517)
Signed-off-by: code4me2 velvetmoon222999@gmail.com
- [Misc] Make
PlaceholderRange.get_num_embedsa method (#34035)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug (#34038)
Signed-off-by: Andreas Karatzas akaratza@amd.com
Fix spelling errors (#33978)
[Misc] Simplify
get_max_tokens(#34036)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CI][Build] Pin grpcio-tools==1.78.0 (#34048)
Signed-off-by: wang.yuqi noooop@126.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- [Renderer] Define
render_cmplandrender_chat(#34039)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006)
Signed-off-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com Signed-off-by: Luka Govedič ProExpertProg@users.noreply.github.com Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com
- [torch.compile] Stop compiling identical artifacts (#34003)
Signed-off-by: Richard Zou zou3519@gmail.com
- Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 (#33939)
Signed-off-by: Akintunde Oladipo akintunde.oladipo@servicenow.com Signed-off-by: TundeAtSN akintunde.oladipo@servicenow.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [Frontend]Add support for transcriptions and translations to run_batch (#33934)
Signed-off-by: Pooya Davoodi pooya.davoodi@parasail.io Signed-off-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- [Model] Enable Step3p5ForCausalLM testing (#33755)
Signed-off-by: Jee Jee Li pandaleefree@gmail.com
- [PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660)
Signed-off-by: whx-sjtu 2952154980@qq.com
- move checks out of
unified_kv_cache_updatecustom op (#33943)
Signed-off-by: Rohan138 rohanpotdar138@gmail.com
- Update DeepGEMM version pin in Dockerfile to match #32479 (#33935)
Signed-off-by: Zifei Tong zifeitong@gmail.com Signed-off-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com
- Make directory exist ok for ray spinning up multiple replicas on a single instance (#33604)
Signed-off-by: Jiang Wu jwu@cclgroup.com
- Perf tuning and expansion of cases covered for wvSplitKrc (#33493)
Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com
- [Doc] Fix run_batch docs (#34056)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CI/Build] Skip GCS test (#34057)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [ROCm][Bugfix] fix act_quant_fusion module import error (#34069)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used (#33855)
Signed-off-by: Nick Hill nickhill123@gmail.com
- [ROCm] [CI] Reduce Resource of two test groups (#34059)
Signed-off-by: tjtanaa tunjian.tan@embeddedllm.com
- Add embedding input functionality for disabled modalities [remake] (#32493)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Signed-off-by: Reagan Lee reaganjlee@gmail.com Signed-off-by: Reagan Lee 96998476+reaganjlee@users.noreply.github.com Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate (#33771)
Signed-off-by: aabbccddwasd aabbccddwasd@qq.com
- [BugFix] Change support no act and mul for marlin (#34088)
Signed-off-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com Co-authored-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com
- [torch.compile] Add an option to force-enable the MOE cold start optimization (#33735)
Signed-off-by: Richard Zou zou3519@gmail.com
glm 4.6 fused tuned inference config for B200 (#32958)
Add support for ModelOpt MXFP8 dense models (#33786)
Signed-off-by: Daniel Serebrenik daserebrenik@nvidia.com
[Release 2.10] Update to Torch 2.10 - final release (#30525)
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state (#34027)
Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com Signed-off-by: kourosh hakhamaneshi 31483498+kouroshHakha@users.noreply.github.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- [Tiny] Rename encoder budget file to more specific name (#34103)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
- [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi yuqi.wang@daocloud.io
- [BugFix] Fix
fastsafetensorsTP all procs using all GPUs (#34070)
Signed-off-by: Nick Hill nickhill123@gmail.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk
- fix(cpu): fix mla_decode compilation on x86 without AVX512 (#34052)
Signed-off-by: ihb2032 hebome@foxmail.com Co-authored-by: root root@LAPTOP-FKNHV411.localdomain
[Model] GLM adaptation (#34124)
[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr (#34107)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [ASR] Fix audio benchmark and add RTFx metric (#32300)
Signed-off-by: Ekagra Ranjan 3116519+ekagra-ranjan@users.noreply.github.com Co-authored-by: Nicolò Lucchesi nicolo.lucchesi@gmail.com
- [Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul (#33901)
Signed-off-by: nikhil-arm nikhil.gupta2@arm.com
- [XPU][6/N] add xpu scaled_mm kernel (#34117)
Signed-off-by: Zhu, Zufang zufang.zhu@intel.com
- [MODEL] Adding Support for Qwen3.5 Models (#34110)
Signed-off-by: JJJYmmm 1650675829@qq.com Signed-off-by: JJJYmmm 92386084+JJJYmmm@users.noreply.github.com Signed-off-by: Roger Wang hey@rogerw.io Co-authored-by: wulipc wulipc@users.noreply.github.com Co-authored-by: ywang96 ywang96@users.noreply.github.com Co-authored-by: Isotr0py Isotr0py@users.noreply.github.com Co-authored-by: Isotr0py 2037008807@qq.com Co-authored-by: Roger Wang hey@rogerw.io
- [Misc] Fix up attention benchmarks (#33810)
Signed-off-by: Lucas Wilkinson lwilkins@redhat.com Signed-off-by: Matthew Bonanni mbonanni@redhat.com Co-authored-by: Matthew Bonanni mbonanni@redhat.com
- [UX] Add
--language-model-onlyfor hybrid models (#34120)
Signed-off-by: Roger Wang hey@rogerw.io
- [CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 (#34031)
Signed-off-by: Luka Govedič lgovedic@redhat.com
- Add NUMA Core binding in nixl_connector for CPU xPyD (#32365)
Signed-off-by: Hongming Zheng hongming.zheng@intel.com Signed-off-by: ZhengHongming888 hongming.zheng@intel.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [Kernel] FlashInfer: switch allreduce fusion to unified API (#33985)
Signed-off-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com
- [Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) (#34087)
Signed-off-by: Tomer Natan tbarnatan@nvidia.com Co-authored-by: Cursor cursoragent@cursor.com
- [Kernel] use flashinfer for gdn prefill (#32846)
Signed-off-by: zjy0516 riverclouds.zhu@qq.com
- [Bugfix] Avoid duplicate k-proj weight emission in helper (#34142)
Signed-off-by: Artus KG artuskg@gmail.com
- [Bugfix] Voxtral prompt/audio placeholder alignment (#34140)
Signed-off-by: Artus KG artuskg@gmail.com
- [ROCm] update triton branch to support gpt-oss models for gfx11xx devices (#34032)
Signed-off-by: Hongxia Yang hongxia.yang@amd.com
- [torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945)
Signed-off-by: charlifu charlifu@amd.com
- [ModelRunner V2][BugFix] Fix
max_query_lencalculation (#34167)
Signed-off-by: Nick Hill nickhill123@gmail.com
[Doc] Add DCP support to attention backend doc (#33936)
[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153)
Signed-off-by: Gregory Shtrasberg Gregory.Shtrasberg@amd.com
- [structured output] validate unsupported json features first (#33233)
Signed-off-by: Andy Xie andy.xning@gmail.com Co-authored-by: Chauncey chaunceyjiang@gmail.com Co-authored-by: Russell Bryant rbryant@redhat.com
- [LMCache] Token Base IPC API (#34175)
Signed-off-by: Oasis-Git ayw.sirius19@gmail.com
- [Bugfix] Adopt
ChunkGatedDeltaRulefor Qwen3.5 (#34198)
Signed-off-by: Roger Wang hey@rogerw.io
- [ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection (#34108)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183)
Signed-off-by: Roger Wang hey@rogerw.io
- [Doc] Update usage of
--limit-mm-per-prompt(#34148)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CI/Build] Relax
test_mcp_tool_call(#34204)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix] Fix DP Attention Padding in Dummy Run (#34187)
Signed-off-by: Benjamin Chislett bchislett@nvidia.com Signed-off-by: Lucas Wilkinson lwilkins@redhat.com Co-authored-by: Benjamin Chislett bchislett@nvidia.com
- [Bugfix] Add
--trust-remote-codeto dataset bench args (#34208)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [responsesAPI] fix simpleContext streaming output_messages (#34188)
Signed-off-by: Andrew Xia axia@meta.com Signed-off-by: Andrew Xia axia@fb.com Co-authored-by: Andrew Xia axia@fb.com
- [Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 (#34190)
Signed-off-by: Balaxxe 136368465+jaim12005@users.noreply.github.com
- [Frontend][CI] Consolidate instrumentator entrypoints (#34123)
Signed-off-by: wang.yuqi yuqi.wang@daocloud.io
- [BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387)
Signed-off-by: Chen Zhang zhangch99@outlook.com
- [Perf] Optimize detokenizer python logic (#32975)
Signed-off-by: yewentao256 zhyanwentao@126.com Signed-off-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Co-authored-by: Nick Hill nhill@redhat.com
Revert #34208 (#34216)
[Bugfix] Fix memory inconsistency in cross-process shared memory (#32022)
Signed-off-by: Zetong Li slippersss@126.com
- [Bugfix] Fix
--trust-remote-codeconflict (#34218)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Docs] Fix format error in KV load failure recovery doc (#34137)
Signed-off-by: Jaebok Lee jaebok9541@naver.com
- [Bugfix] Fix FI kernel
chunk_gated_delta_ruleoutput shape for Qwen3.5 (#34219)
Signed-off-by: Roger Wang hey@rogerw.io
- Add flagos in MiniCPM-o (#34126)
Signed-off-by: tc-mb caitianchi@modelbest.cn Signed-off-by: Vincent-Xiao vincent.xiao.me@gmail.com Co-authored-by: Vincent-Xiao vincent.xiao.me@gmail.com
[Misc] allow specify is_mm_prefix_lm in hf_config (#34215)
Stop testing for slow tokenizers as they will not exist soon (#34235)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220)
Signed-off-by: KrxGu krishom70@gmail.com
- Bump
mamba-ssmversion in CI for Transformers v5 compatibility (#34233)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- add --insecure arg to the vllm bench to skip TLS (#34026)
Signed-off-by: Fan Yang yan9fan@meta.com Co-authored-by: Fan Yang yan9fan@meta.com
- Support benchmarking of Geospatial models (#33922)
Signed-off-by: Michele Gazzetti michele.gazzetti1@ibm.com
- [ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008)
Signed-off-by: xuebwang-amd xuebwang@amd.com Signed-off-by: Robert Shaw robshaw@redhat.com Co-authored-by: Robert Shaw robshaw@redhat.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [compile] Enable AOT compile with 2.10 in trunk. (#34155)
Signed-off-by: Zhengxu Chen zhxchen17@meta.com
- [Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680)
Signed-off-by: LopezCastroRoberto rocastro@redhat.com Signed-off-by: Roberto L. Castro 38211239+LopezCastroRoberto@users.noreply.github.com Co-authored-by: Claude Sonnet 4.5 noreply@anthropic.com
- [Core][BugFix] Fix PP KV cache sharding memory validation (#33698)
Signed-off-by: junuxyz 216036880+junuxyz@users.noreply.github.com
- [BUGFIX] Fix accuracy bugs in Qwen3-Next MTP (#34077)
Signed-off-by: Vadim Gimpelson vadim.gimpelson@gmail.com
- [Docs] Speed up build environment set-up (#34240)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Model Runner V2] Use pinned memory for write_contents (#34222)
Signed-off-by: Woosuk Kwon woosuk@inferact.ai
- Minor cleanup for Voxtral (#34247)
Signed-off-by: Andy Lo andy@mistral.ai
- [UX nit] Fix non-default api_server_count message (#34152)
Signed-off-by: mgoin mgoin64@gmail.com
- [Misc] Introduce ec_both role EC (encoder cache) connector (#34182)
Signed-off-by: Qi Wang qiwa@nvidia.com
- Convert online APIs to use Renderer (#34084)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
- [Bugfix] Fix weights offloading for sleep mode (#32947)
Signed-off-by: Jarno Seppänen jseppanen@nvidia.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com
- [Benchmarks] Fix attention benchmark smoke test (#34269)
Signed-off-by: Matthew Bonanni mbonanni@redhat.com
- [Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200)
Signed-off-by: Roger Wang hey@rogerw.io
- [SM100] Resubmit FMHA FP8 prefill for MLA (#31195)
Signed-off-by: Pavani Majety pmajety@nvidia.com
- [Feature] Warn about unrecognized environment variables (#33581)
Signed-off-by: Gregory Shtrasberg Gregory.Shtrasberg@amd.com
- [Perf] Move eplb rebalance algo to async thread (#30888)
Signed-off-by: ilmarkov markovilya197@gmail.com Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com Co-authored-by: Tyler Michael Smith tlrmchlsmth@gmail.com
- [BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860)
Signed-off-by: ilmarkov markovilya197@gmail.com
- [Misc][Spec Decode] support different load config for draft model (#34022)
Signed-off-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com Co-authored-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com
- [torch.compile] Disable recursive pre_grad_passes (#34092)
Signed-off-by: Richard Zou zou3519@gmail.com
- [Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271)
Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
- [CI] Add pip caching to cleanup_pr_body workflow (#32979)
Signed-off-by: 7. Sun jhao.sun@gmail.com
- [MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344)
Signed-off-by: Bill Nell bnell@redhat.com
- [ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280)
Signed-off-by: Micah Williamson micah.williamson@amd.com
- [Misc] Add run one batch script that supports profiling (#32968)
Signed-off-by: Lucas Wilkinson lwilkins@redhat.com
- [Bugfix] Fix Worker.load_model context-manager composition for sleep mode (#34021)
Signed-off-by: tianshu.yu tianshuyu.formal@gmail.com
- [Redo] Add
--trust-remote-codeto dataset bench args (#34251)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093)
Signed-off-by: Richard Zou zou3519@gmail.com
- [WideEP] Fix nvfp4 DeepEP High Throughput All2All backend (#33738)
Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [Misc] Clean up validation logic in input processor (#34144)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast (#33884)
Signed-off-by: Kebe mail@kebe7jun.com
- [Kernel] Apply 256bit LDG/STG To Activation Kernels (#33022)
Signed-off-by: Dzerzhinsky 256908701+AstroVoyager7@users.noreply.github.com Signed-off-by: Дзержи́нский 256908701+AstroVoyager7@users.noreply.github.com Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com
- [XPU][7/N] enable xpu fp8 moe (#34202)
Signed-off-by: Zhu, Zufang zufang.zhu@intel.com
- [Plugin] Simplify IO Processor Plugin interface (#34236)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149)
Signed-off-by: Matthias Gehre matthias.gehre@amd.com
- Threshold fix wvSplitk for occasional CI fails (#34013)
Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com
- [ModelBash][DSR1 NVFp4] Removed Bf16 Bias Cast (#34298)
Signed-off-by: Robert Shaw robshaw@redhat.com Co-authored-by: Robert Shaw robshaw@redhat.com
- [Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides (#34279)
Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
- [Bugfix] Fix weight naming in Qwen3.5 (#34313)
Signed-off-by: Roger Wang hey@rogerw.io
- [CPU] Enable FP16 (Half dtype) support for s390x (#34116)
Signed-off-by: Rehan Khan Rehan.Khan7@ibm.com
- [model] support FunASR model (#33247)
Signed-off-by: zixiao shunli.dsl@alibaba-inc.com Co-authored-by: zixiao shunli.dsl@alibaba-inc.com
- [XPU][9/N] clean up existing ipex code/doc (#34111)
Signed-off-by: Kunshang Ji kunshang.ji@intel.com
- [Chore] Move
BaseRenderertobase.py(#34308)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [torch.compile] Enable AR+rms fusion by default available for
-O2(#34299)
Signed-off-by: Luka Govedič lgovedic@redhat.com
- [Misc] Bump
fastsafetensorsversion for latest fixes (#34273)
Signed-off-by: Nick Hill nickhill123@gmail.com
- [Doc] Update Marlin support matrix for Turing (#34319)
Signed-off-by: Tianqi Ren tianqi.r@outlook.com
- [Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217)
Signed-off-by: Nick Hill nickhill123@gmail.com
- Patch protobuf for CVE-2026-0994 (#34253)
Signed-off-by: Seiji Eicher seiji@anyscale.com Co-authored-by: Kevin H. Luu khluu000@gmail.com
- [Docs] Reduce time spent generating API docs (#34255)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Bugfix][CPU] Fix llama4 inference on CPU (#34321)
Signed-off-by: jiang1.li jiang1.li@intel.com
- Make Qwen3VL compatible with Transformers v5 (#34262)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Signed-off-by: Roger Wang hey@rogerw.io Co-authored-by: Roger Wang hey@rogerw.io
- Make JAIS compatible with Transformers v5 (#34264)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715)
Signed-off-by: Linda-Stadter 57756729+Linda-Stadter@users.noreply.github.com Co-authored-by: Pavani Majety pmajety@nvidia.com
- Responses harmony system message structured (#34268)
Signed-off-by: Adam Binford adamq43@gmail.com
- Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson lwilkins@redhat.com
- Don't try and run GLM-ASR with remote code (#34352)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948)
Signed-off-by: Rohan138 rohanpotdar138@gmail.com
- [ROCm] [aiter] Split KV cache update for AiterFlashAttention (#33681)
Signed-off-by: kliuae kuanfu.liu@embeddedllm.com
- [Docs] Fix typo ("defult") and double spacing (#34348)
Signed-off-by: SorenDreano 71752785+SorenDreano@users.noreply.github.com Co-authored-by: Soren Dreano soren@numind.ai Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… (#32458)
Signed-off-by: junuxyz 216036880+junuxyz@users.noreply.github.com
- [Model Runner V2] Init cuda graph pool when necessary (#33217)
Signed-off-by: Xinyu Chen xinyu1.chen@intel.com
- [Multimodal] Expose
mm_processor_kwargsforDummyInputsBuilder(#34330)
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
- [Bugfix] fix default is_neox_style is True for deepseek (#34353)
Signed-off-by: dongxinyu03 dongxinyu03@baidu.com
- [Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) (#34243)
Signed-off-by: Your Name you@example.com Co-authored-by: Your Name you@example.com
- [ROCm] [CI] fix test_unrecognized_env (#34350)
Signed-off-by: tjtanaa tunjian.tan@embeddedllm.com
- [GPT-OSS] Remove unnecessary contiguous (#34337)
Signed-off-by: elvischenv 219235043+elvischenv@users.noreply.github.com
- Add cartridge (prefix) benchmark configs to CI workflows
The prefix_latency and prefix_throughput configs existed but weren't being run by any workflow. Each benchmark workflow now runs both the base and cartridge configs using the shared server support.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
update flashinfer
update wheel
update cuda and flashinfer
downgrade
update tests
Signed-off-by: Taeksang Kim ts.kim@hyperaccel.ai Signed-off-by: simondanielsson simon.danielsson99@hotmail.com Signed-off-by: Muhammad Hashmi mhashmi@berkeley.edu Signed-off-by: NickLucche nlucches@redhat.com Signed-off-by: Richard Zou zou3519@gmail.com Signed-off-by: Sage Moore sagmoore@redhat.com Signed-off-by: Nick Hill nickhill123@gmail.com Signed-off-by: Ilya Boytsov ilyaboytsov1805@gmail.com Signed-off-by: Ilya Boytsov boytsovpanamera@mail.ru Signed-off-by: Luka Govedič lgovedic@redhat.com Signed-off-by: ProExpertProg luka.govedic@gmail.com Signed-off-by: Luka Govedič ProExpertProg@users.noreply.github.com Signed-off-by: Zhanqiu Hu zh338@cornell.edu Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com Signed-off-by: Kevin H. Luu khluu000@gmail.com Signed-off-by: Andreas Karatzas akaratza@amd.com Signed-off-by: rinbaro ilgomishra@gmail.com Signed-off-by: Randall Smith Randall.Smith@amd.com Signed-off-by: Andrew Xia axia@fb.com Signed-off-by: Andrew Xia axia@meta.com Signed-off-by: jiang1.li jiang1.li@intel.com Signed-off-by: wang.yuqi yuqi.wang@daocloud.io Signed-off-by: Fadi Arafeh fadi.arafeh@arm.com Signed-off-by: Mark McLoughlin markmc@redhat.com Signed-off-by: Pavani Majety pmajety@nvidia.com Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk Signed-off-by: Matthew Wong Matthew.Wong2@amd.com Signed-off-by: Liran Schour lirans@il.ibm.com Signed-off-by: liranschour liranschour@users.noreply.github.com Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn Signed-off-by: mariohong mariohong128@gmail.com Signed-off-by: ahao-anyscale ahao@anyscale.com Signed-off-by: Aaron Hao ahao@anyscale.com Signed-off-by: Daniel Serebrenik daserebrenik@nvidia.com Signed-off-by: Benjamin Chislett bchislett@nvidia.com Signed-off-by: Yoray Zack yorayz@nvidia.com Signed-off-by: Bill Nell bnell@redhat.com Signed-off-by: Tsukasa OI floss_llm@irq.a4lg.com Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Signed-off-by: Matthew Bonanni mbonanni@redhat.com Signed-off-by: Lihao Ran imlihao.ran@gmail.com Signed-off-by: mgoin mgoin64@gmail.com Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com Signed-off-by: wzhao18 wzhao18.sz@gmail.com Signed-off-by: simon-mo simon.mo@hey.com Signed-off-by: rabi ramishra@redhat.com Signed-off-by: limingliang limingliang@stepfun.com Signed-off-by: Rehan Khan Rehan.Khan7@ibm.com Signed-off-by: Kunshang Ji kunshang.ji@intel.com Signed-off-by: sihao.li sihao.li@intel.com Signed-off-by: Chengcheng Pei chengchengpei@outlook.com Signed-off-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Signed-off-by: Gassan gassan.salama@arm.com Signed-off-by: Xinyu Chen xinyu1.chen@intel.com Signed-off-by: zhangyue66 zhangyue66@baidu.com Signed-off-by: kurt kurt@thinkingmachines.ai Signed-off-by: raushan raushan@huggingface.co Signed-off-by: Raushan Turganbay raushan.turganbay@alumni.nu.edu.kz Signed-off-by: Frederic Odermatt frederic.odermatt@44ai.ch Signed-off-by: caitianchi caitianchi@modelbest.cn Signed-off-by: tc-mb caitianchi@modelbest.cn Signed-off-by: Zhu, Zufang zufang.zhu@intel.com Signed-off-by: Eldar Kurtić 8884008+eldarkurtic@users.noreply.github.com Signed-off-by: yewentao256 zhyanwentao@126.com Signed-off-by: Seiji Eicher seiji@anyscale.com Signed-off-by: vllmellm vllm.ellm@embeddedllm.com Signed-off-by: zhuhaoran zhuhaoran.zhr@alibaba-inc.com Signed-off-by: charlifu charlifu@amd.com Signed-off-by: xuebwang-amd xuebwang@amd.com Signed-off-by: SumanthRH sumanthrh99@gmail.com Signed-off-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com Signed-off-by: Ikenna ikennachifo@gmail.com Signed-off-by: Xin Yang xyangx@amazon.com Signed-off-by: code4me2 velvetmoon222999@gmail.com Signed-off-by: wang.yuqi noooop@126.com Signed-off-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com Signed-off-by: Akintunde Oladipo akintunde.oladipo@servicenow.com Signed-off-by: TundeAtSN akintunde.oladipo@servicenow.com Signed-off-by: Pooya Davoodi pooya.davoodi@parasail.io Signed-off-by: Cyrus Leung cyrus.tl.leung@gmail.com Signed-off-by: Jee Jee Li pandaleefree@gmail.com Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: Rohan138 rohanpotdar138@gmail.com Signed-off-by: Zifei Tong zifeitong@gmail.com Signed-off-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Signed-off-by: Jiang Wu jwu@cclgroup.com Signed-off-by: tjtanaa tunjian.tan@embeddedllm.com Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Signed-off-by: Reagan Lee reaganjlee@gmail.com Signed-off-by: Reagan Lee 96998476+reaganjlee@users.noreply.github.com Signed-off-by: aabbccddwasd aabbccddwasd@qq.com Signed-off-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com Signed-off-by: kourosh hakhamaneshi 31483498+kouroshHakha@users.noreply.github.com Signed-off-by: ihb2032 hebome@foxmail.com Signed-off-by: Ekagra Ranjan 3116519+ekagra-ranjan@users.noreply.github.com Signed-off-by: nikhil-arm nikhil.gupta2@arm.com Signed-off-by: JJJYmmm 1650675829@qq.com Signed-off-by: JJJYmmm 92386084+JJJYmmm@users.noreply.github.com Signed-off-by: Roger Wang hey@rogerw.io Signed-off-by: Lucas Wilkinson lwilkins@redhat.com Signed-off-by: Hongming Zheng hongming.zheng@intel.com Signed-off-by: ZhengHongming888 hongming.zheng@intel.com Signed-off-by: Tomer Natan tbarnatan@nvidia.com Signed-off-by: zjy0516 riverclouds.zhu@qq.com Signed-off-by: Artus KG artuskg@gmail.com Signed-off-by: Hongxia Yang hongxia.yang@amd.com Signed-off-by: Gregory Shtrasberg Gregory.Shtrasberg@amd.com Signed-off-by: Andy Xie andy.xning@gmail.com Signed-off-by: Oasis-Git ayw.sirius19@gmail.com Signed-off-by: Balaxxe 136368465+jaim12005@users.noreply.github.com Signed-off-by: Chen Zhang zhangch99@outlook.com Signed-off-by: Zetong Li slippersss@126.com Signed-off-by: Jaebok Lee jaebok9541@naver.com Signed-off-by: Vincent-Xiao vincent.xiao.me@gmail.com Signed-off-by: KrxGu krishom70@gmail.com Signed-off-by: Fan Yang yan9fan@meta.com Signed-off-by: Michele Gazzetti michele.gazzetti1@ibm.com Signed-off-by: Robert Shaw robshaw@redhat.com Signed-off-by: Zhengxu Chen zhxchen17@meta.com Signed-off-by: LopezCastroRoberto rocastro@redhat.com Signed-off-by: Roberto L. Castro 38211239+LopezCastroRoberto@users.noreply.github.com Signed-off-by: junuxyz 216036880+junuxyz@users.noreply.github.com Signed-off-by: Vadim Gimpelson vadim.gimpelson@gmail.com Signed-off-by: Woosuk Kwon woosuk@inferact.ai Signed-off-by: Andy Lo andy@mistral.ai Signed-off-by: Qi Wang qiwa@nvidia.com Signed-off-by: Jarno Seppänen jseppanen@nvidia.com Signed-off-by: ilmarkov markovilya197@gmail.com Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Signed-off-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com Signed-off-by: 7. Sun jhao.sun@gmail.com Signed-off-by: Micah Williamson micah.williamson@amd.com Signed-off-by: tianshu.yu tianshuyu.formal@gmail.com Signed-off-by: Kebe mail@kebe7jun.com Signed-off-by: Dzerzhinsky 256908701+AstroVoyager7@users.noreply.github.com Signed-off-by: Дзержи́нский 256908701+AstroVoyager7@users.noreply.github.com Signed-off-by: Matthias Gehre matthias.gehre@amd.com Signed-off-by: zixiao shunli.dsl@alibaba-inc.com Signed-off-by: Tianqi Ren tianqi.r@outlook.com Signed-off-by: Linda-Stadter 57756729+Linda-Stadter@users.noreply.github.com Signed-off-by: Adam Binford adamq43@gmail.com Signed-off-by: kliuae kuanfu.liu@embeddedllm.com Signed-off-by: SorenDreano 71752785+SorenDreano@users.noreply.github.com Signed-off-by: dongxinyu03 dongxinyu03@baidu.com Signed-off-by: Your Name you@example.com Signed-off-by: elvischenv 219235043+elvischenv@users.noreply.github.com Co-authored-by: Taeksang Kim voidbag@gmail.com Co-authored-by: Simon Danielsson 70206058+simondanielsson@users.noreply.github.com Co-authored-by: Muhammad Hashmi 105992724+mu-hashmi@users.noreply.github.com Co-authored-by: NickLucche nlucches@redhat.com Co-authored-by: Richard Zou zou3519@users.noreply.github.com Co-authored-by: Sage Moore sagmoore@redhat.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com Co-authored-by: Nick Hill nickhill123@gmail.com Co-authored-by: Ilya Boytsov boytsovpanamera@mail.ru Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: wang.yuqi yuqi.wang@daocloud.io Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com Co-authored-by: zhanqiuhu 49648934+ZhanqiuHu@users.noreply.github.com Co-authored-by: Chauncey chaunceyjiang@gmail.com Co-authored-by: Kevin H. Luu khluu000@gmail.com Co-authored-by: Andreas Karatzas akaratza@amd.com Co-authored-by: Nick Hill nhill@redhat.com Co-authored-by: rinbaro ilgomishra@gmail.com Co-authored-by: rasmith Randall.Smith@amd.com Co-authored-by: Andrew Xia axia@meta.com Co-authored-by: Andrew Xia axia@fb.com Co-authored-by: Li, Jiang jiang1.li@intel.com Co-authored-by: Fadi Arafeh 115173828+fadara01@users.noreply.github.com Co-authored-by: Mark McLoughlin markmc@redhat.com Co-authored-by: Pavani Majety pmajety@nvidia.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk Co-authored-by: Matthew Wong Matthew.Wong2@amd.com Co-authored-by: liranschour liranschour@users.noreply.github.com Co-authored-by: Or Ozeri or@ozery.com Co-authored-by: Nicolò Lucchesi nicolo.lucchesi@gmail.com Co-authored-by: jiahanc 173873397+jiahanc@users.noreply.github.com Co-authored-by: Isotr0py mozf@mail2.sysu.edu.cn Co-authored-by: Mario Hong 86880754+mariohong128@users.noreply.github.com Co-authored-by: Aaron Hao ahao@anyscale.com Co-authored-by: SumanthRH sumanthrh99@gmail.com Co-authored-by: danisereb daserebrenik@nvidia.com Co-authored-by: Benjamin Chislett bchislett@nvidia.com Co-authored-by: zackyoray yorayz@nvidia.com Co-authored-by: bnellnm 49004751+bnellnm@users.noreply.github.com Co-authored-by: Tsukasa OI floss_llm@irq.a4lg.com Co-authored-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Co-authored-by: Matthew Bonanni mbonanni@redhat.com Co-authored-by: Lumosis 30372757+Lumosis@users.noreply.github.com Co-authored-by: mgoin mgoin64@gmail.com Co-authored-by: Hashem Hashemi 159079214+amd-hhashemi@users.noreply.github.com Co-authored-by: Wei Zhao 51183510+wzhao18@users.noreply.github.com Co-authored-by: emricksini-h emrick.birivoutin@hcompany.ai Co-authored-by: Xin Yang 105740670+xyang16@users.noreply.github.com Co-authored-by: Simon Mo simon.mo@hey.com Co-authored-by: Rabi Mishra ramishra@redhat.com Co-authored-by: Mingliang Li limingliang0527@gmail.com Co-authored-by: limingliang limingliang@stepfun.com Co-authored-by: R3hankhan Rehan.Khan7@ibm.com Co-authored-by: Kunshang Ji kunshang.ji@intel.com Co-authored-by: sihao_li 165983188+1643661061leo@users.noreply.github.com Co-authored-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Gassan Salama gassan.salama@arm.com Co-authored-by: Xinyu Chen xinyu1.chen@intel.com Co-authored-by: zhang-prog 69562787+zhang-prog@users.noreply.github.com Co-authored-by: Kurt Shuster shuster.kurt@gmail.com Co-authored-by: SorenDreano 71752785+SorenDreano@users.noreply.github.com Co-authored-by: Soren Dreano soren@numind.ai Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Co-authored-by: Raushan Turganbay raushan@huggingface.co Co-authored-by: FredericOdermatt 50372080+FredericOdermatt@users.noreply.github.com Co-authored-by: tc-mb 157115220+tc-mb@users.noreply.github.com Co-authored-by: mslv mslv@baai.ac.cn Co-authored-by: zofia 110436990+zufangzhu@users.noreply.github.com Co-authored-by: Eldar Kurtić 8884008+eldarkurtic@users.noreply.github.com Co-authored-by: Seiji Eicher 58963096+eicherseiji@users.noreply.github.com Co-authored-by: vllmellm vllm.ellm@embeddedllm.com Co-authored-by: TJian tunjian.tan@embeddedllm.com Co-authored-by: zhrrr 43847754+izhuhaoran@users.noreply.github.com Co-authored-by: Charlie Fu charlifu@amd.com Co-authored-by: xuebwang-amd xuebwang@amd.com Co-authored-by: Sumanth R Hegde 39546518+SumanthRH@users.noreply.github.com Co-authored-by: Dimitrios Bariamis dbari@users.noreply.github.com Co-authored-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Co-authored-by: kourosh hakhamaneshi 31483498+kouroshHakha@users.noreply.github.com Co-authored-by: Ikenna ikennachifo@gmail.com Co-authored-by: Cursor cursoragent@cursor.com Co-authored-by: 果冻虾仁 guodong@apache.org Co-authored-by: Vel 110626982+Code4me2@users.noreply.github.com Co-authored-by: lukec 118525388+sleepcoo@users.noreply.github.com Co-authored-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com Co-authored-by: TundeAtSN akintunde.oladipo@servicenow.com Co-authored-by: Pooya Davoodi pooya.davoodi@parasail.io Co-authored-by: Jee Jee Li pandaleefree@gmail.com Co-authored-by: whx 56632993+whx-sjtu@users.noreply.github.com Co-authored-by: Rohan Potdar 66227218+Rohan138@users.noreply.github.com Co-authored-by: zifeitong zifeitong@gmail.com Co-authored-by: Jiang Wu jwu@cclgroup.com Co-authored-by: Reagan Lee 96998476+reaganjlee@users.noreply.github.com Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: aabbccddwasd 140953076+aabbccddwasd@users.noreply.github.com Co-authored-by: TomerBN-Nvidia tbarnatan@nvidia.com Co-authored-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com Co-authored-by: navmarri14 nmarri@roblox.com Co-authored-by: Andrey Talman atalman@fb.com Co-authored-by: ihb2032 40718643+ihb2032@users.noreply.github.com Co-authored-by: root root@LAPTOP-FKNHV411.localdomain Co-authored-by: Ekagra Ranjan 3116519+ekagra-ranjan@users.noreply.github.com Co-authored-by: Nikhil Gupta nikhil.gupta2@arm.com Co-authored-by: JJJYmmm 92386084+JJJYmmm@users.noreply.github.com Co-authored-by: wulipc wulipc@users.noreply.github.com Co-authored-by: ywang96 ywang96@users.noreply.github.com Co-authored-by: Isotr0py Isotr0py@users.noreply.github.com Co-authored-by: Isotr0py 2037008807@qq.com Co-authored-by: Roger Wang hey@rogerw.io Co-authored-by: Lucas Wilkinson LucasWilkinson@users.noreply.github.com Co-authored-by: ZhengHongming888 hongming.zheng@intel.com Co-authored-by: Jiangyun Zhu riverclouds.zhu@qq.com Co-authored-by: Artus Krohn-Grimberghe artuskg@users.noreply.github.com Co-authored-by: Hongxia Yang 62075498+hongxiayang@users.noreply.github.com Co-authored-by: Gregory Shtrasberg 156009573+gshtras@users.noreply.github.com Co-authored-by: Ning Xie andy.xning@gmail.com Co-authored-by: Russell Bryant rbryant@redhat.com Co-authored-by: Yuwei An ayw.sirius19@gmail.com Co-authored-by: Balaxxe 136368465+jaim12005@users.noreply.github.com Co-authored-by: Chen Zhang zhangch99@outlook.com Co-authored-by: Zetong Li 48438720+slippersss@users.noreply.github.com Co-authored-by: zzaebok 44357534+zzaebok@users.noreply.github.com Co-authored-by: Vincent-Xiao vincent.xiao.me@gmail.com Co-authored-by: Phúc H. Lê Khắc lkhphuc@pm.me Co-authored-by: Krish Gupta krishom70@gmail.com Co-authored-by: Fan Yang fanyang.real@gmail.com Co-authored-by: Fan Yang yan9fan@meta.com Co-authored-by: mgazz michele.gazzetti1@ibm.com Co-authored-by: Robert Shaw robshaw@redhat.com Co-authored-by: Zhengxu Chen zhxchen17@meta.com Co-authored-by: Roberto L. Castro 38211239+LopezCastroRoberto@users.noreply.github.com Co-authored-by: Claude Sonnet 4.5 noreply@anthropic.com Co-authored-by: junuxyz 216036880+junuxyz@users.noreply.github.com Co-authored-by: Vadim Gimpelson 156319763+vadiklyutiy@users.noreply.github.com Co-authored-by: Woosuk Kwon woosuk.kwon@berkeley.edu Co-authored-by: Andy Lo andy@mistral.ai Co-authored-by: Qi Wang wqstu1@gmail.com Co-authored-by: J Seppänen 83203+jseppanen@users.noreply.github.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com Co-authored-by: Ilya Markov markovilya197@gmail.com Co-authored-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Zhengkai Zhang 33679250+ZhengkaiZ@users.noreply.github.com Co-authored-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com Co-authored-by: 7. Sun jhao.sun@gmail.com Co-authored-by: Micah Williamson micah.williamson@amd.com Co-authored-by: tianshu-Michael-yu 101950379+tianshu-Michael-yu@users.noreply.github.com Co-authored-by: Kebe mail@kebe7jun.com Co-authored-by: Дзержи́нский 256908701+AstroVoyager7@users.noreply.github.com Co-authored-by: Matthias Gehre matthias.gehre@amd.com Co-authored-by: AllenDou allen.dou@hotmail.com Co-authored-by: zixiao shunli.dsl@alibaba-inc.com Co-authored-by: Tianqi Ren tianqi.r@outlook.com Co-authored-by: Linda 57756729+Linda-Stadter@users.noreply.github.com Co-authored-by: Adam Binford adamq43@gmail.com Co-authored-by: kliuae 17350011+kliuae@users.noreply.github.com Co-authored-by: Xinyu Dong dongxinyu03@baidu.com Co-authored-by: Your Name you@example.com Co-authored-by: elvischenv 219235043+elvischenv@users.noreply.github.com
jxmorris12 added a commit to jxmorris12/vllm that referenced this pull request
- [Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 (#33637)
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com
- [release] Minor fixes to release annotation (#33849)
Signed-off-by: Kevin H. Luu khluu000@gmail.com
- [CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode (#32762)
Signed-off-by: Andreas Karatzas akaratza@amd.com
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841)
[Minor] Include
StreamingInputin inputs package (#33856)
Signed-off-by: Nick Hill nickhill123@gmail.com
- [docs] fix unintentional misspellings (#33863)
Signed-off-by: rinbaro ilgomishra@gmail.com
- [CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840)
Signed-off-by: Randall Smith Randall.Smith@amd.com
- [2/N] move responses/serving _make_response_output_items logic to parser (#33281)
Signed-off-by: Andrew Xia axia@fb.com Signed-off-by: Andrew Xia axia@meta.com Co-authored-by: Andrew Xia axia@fb.com
- [CI/Build] Parallelize CPU CI tests (#33778)
Signed-off-by: jiang1.li jiang1.li@intel.com
- [Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result (#33837)
Signed-off-by: Andreas Karatzas akaratza@amd.com Signed-off-by: wang.yuqi yuqi.wang@daocloud.io Co-authored-by: wang.yuqi yuqi.wang@daocloud.io
- [CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727)
Signed-off-by: Fadi Arafeh fadi.arafeh@arm.com
- [CI/Build] Fix CPU CI test case title (#33870)
Signed-off-by: jiang1.li jiang1.li@intel.com
- [Perf] Optimize the performance of structured output + reasoning (#33557)
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com
- [KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522)
Signed-off-by: Mark McLoughlin markmc@redhat.com
- [Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels. (#33858)
Signed-off-by: Pavani Majety pmajety@nvidia.com
- [Refactor] Move
taskoutside ofPoolingParams.verify(#33796)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk Signed-off-by: wang.yuqi yuqi.wang@daocloud.io Co-authored-by: wang.yuqi yuqi.wang@daocloud.io
- [ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710)
Signed-off-by: Andreas Karatzas akaratza@amd.com Signed-off-by: Matthew Wong Matthew.Wong2@amd.com Co-authored-by: Matthew Wong Matthew.Wong2@amd.com
- Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour lirans@il.ibm.com Signed-off-by: liranschour liranschour@users.noreply.github.com Co-authored-by: Or Ozeri or@ozery.com Co-authored-by: Nicolò Lucchesi nicolo.lucchesi@gmail.com Co-authored-by: Nicolò Lucchesi nlucches@redhat.com
[perf] Integrate flashinfer concat_mla_k (#31171)
[Bugfix] Fix Kimi-K2.5 NVFP4 checkpoints weight loading (#33876)
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
- [Refactor] Clean up input preprocessing (#33687)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix] Fix corner case of sparse embedding (#33886)
Signed-off-by: wang.yuqi yuqi.wang@daocloud.io
- [Docs] Add bart-plugin to docs (#33905)
Signed-off-by: NickLucche nlucches@redhat.com
- [Bugfix] Fix step3p5 parser when using mtp (#33690)
Signed-off-by: mariohong mariohong128@gmail.com
- [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale ahao@anyscale.com Signed-off-by: Aaron Hao ahao@anyscale.com Co-authored-by: SumanthRH sumanthrh99@gmail.com
- [BugFix] Fix LoRA Fp8 (#33879)
Signed-off-by: Daniel Serebrenik daserebrenik@nvidia.com
- [Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett bchislett@nvidia.com
- [Misc] Add debug logs (#33931)
Signed-off-by: NickLucche nlucches@redhat.com
- [Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path (#33795)
Signed-off-by: Yoray Zack yorayz@nvidia.com
- [Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell bnell@redhat.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [Models] Consolidate Deepseek-OCR2 processor (#33909)
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
- [Bugfix] Suppress non-TTY color output on the process name part of the log (#29714)
Signed-off-by: Tsukasa OI floss_llm@irq.a4lg.com
- Fix tokenizer test for renamed attr on Transformers v5 (#33902)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Misc] Rename
translationstospeech_to_textfor OAI serving component (#33904)
Signed-off-by: NickLucche nlucches@redhat.com
- [Bugfix] Fix DSV3.2 NVFP4 (#33932)
Signed-off-by: Matthew Bonanni mbonanni@redhat.com
- [Bugfix] Make MM batching more robust (#33817)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Minor] Sort safetensors files to ensure deterministic loading order (#33491)
Signed-off-by: Lihao Ran imlihao.ran@gmail.com Signed-off-by: mgoin mgoin64@gmail.com Co-authored-by: mgoin mgoin64@gmail.com
- Adds padding and perf improvements to wvSplitK_fp8 (#33527)
Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com
- [Bugfix] Fix DeepSeek v3.2 tokenizer outputting None issue (#33832)
Signed-off-by: wzhao18 wzhao18.sz@gmail.com
[Feature] OTEL tracing during loading (#31162)
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568)
[Docs] Add reo analytics (#33957)
Signed-off-by: simon-mo simon.mo@hey.com
- fix(ROCm): Make flash_attn import optional in MLA attention (#33511)
Signed-off-by: rabi ramishra@redhat.com
- feat(frontend): early-fail tokenization guard for user requests (#31366)
Signed-off-by: limingliang limingliang@stepfun.com Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk Co-authored-by: limingliang limingliang@stepfun.com Co-authored-by: DarkLight1337 tlleungac@connect.ust.hk
- [Misc] Update code for encoder-decoder models (#33900)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CPU] Add BF16 Kernel type for s390x (#33788)
Signed-off-by: Rehan Khan Rehan.Khan7@ibm.com
- [XPU][4/N] add mxfp4 moe model support (#33679)
Signed-off-by: Kunshang Ji kunshang.ji@intel.com
- [XPU]Replace pip in docker.xpu with uv pip (#31112)
Signed-off-by: sihao.li sihao.li@intel.com
- Onboard voyage-4-nano (#33720)
Signed-off-by: Chengcheng Pei chengchengpei@outlook.com Signed-off-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Co-authored-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation (#32263)
Signed-off-by: Gassan gassan.salama@arm.com
- Fix
mainpre-commit (#33975)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- support view_from_cpu_tensor on XPU (#33868)
Signed-off-by: Xinyu Chen xinyu1.chen@intel.com
- Consolidate and fix forbidden import
pre-commitchecks (#33982)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [PaddleOCR-VL] Add BC for transformers 5.0 config (#33976)
Signed-off-by: zhangyue66 zhangyue66@baidu.com
- Bump HF Hub client to get bug fix (#33984)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [CPU][BugFix] Fix loading of w8a8int models with bias (#33582)
Signed-off-by: Fadi Arafeh fadi.arafeh@arm.com
- [torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič lgovedic@redhat.com Signed-off-by: ProExpertProg luka.govedic@gmail.com Signed-off-by: Luka Govedič ProExpertProg@users.noreply.github.com
- [Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816)
Signed-off-by: kurt kurt@thinkingmachines.ai
- [Docs] Improve documentation (#33799)
Co-authored-by: Soren Dreano soren@numind.ai Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com
- Update
WeightTransferConfigto be more standard like the others (#33989)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Bugfix] Fix models and tests for transformers v5 (#33977)
Signed-off-by: raushan raushan@huggingface.co Signed-off-by: Raushan Turganbay raushan.turganbay@alumni.nu.edu.kz Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab (#33509)
Signed-off-by: Frederic Odermatt frederic.odermatt@44ai.ch
- [ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [Docs] Add sections on process architecture and minimum CPU resources (#33940)
It seems users can be confused about vLLM's performance when running with very small amounts of CPU cores available. We are missing a clear overview of what vLLM's process architecture is, so I added this along with some diagrams in arch_overview.md, and included a section on CPU resource recommendations in optimization.md
Signed-off-by: mgoin mgoin64@gmail.com
- [Model] Support MiniCPM-o 4.5 (#33431)
Signed-off-by: caitianchi caitianchi@modelbest.cn Signed-off-by: tc-mb caitianchi@modelbest.cn Co-authored-by: mslv mslv@baai.ac.cn
- [Refactor] Consolidate sequence normalization and enc-dec parsing (#33928)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [XPU][5/N] add wna16 xpu kernel (#33973)
Signed-off-by: Zhu, Zufang zufang.zhu@intel.com
- [Docs] Update link to Benchmark CLI documentation (#33254)
Signed-off-by: Eldar Kurtić 8884008+eldarkurtic@users.noreply.github.com
- [Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 (#33964)
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com
- [Log] Optimize duplicate startup log (#33944)
Signed-off-by: yewentao256 zhyanwentao@126.com
- [KV Connector] Add missing method overrides to MultiConnector (#33292)
Signed-off-by: Seiji Eicher seiji@anyscale.com
- [DOC] [ROCm] Update docker deployment doc (#33971)
Signed-off-by: vllmellm vllm.ellm@embeddedllm.com Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Co-authored-by: TJian tunjian.tan@embeddedllm.com Co-authored-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Model Runner V2] support apply penalty for spec decode (#33251)
Signed-off-by: zhuhaoran zhuhaoran.zhr@alibaba-inc.com
- [Refactor] Remove align block size logic in
moe_permute(#33449)
Signed-off-by: yewentao256 zhyanwentao@126.com
- [Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op (#33734)
Signed-off-by: charlifu charlifu@amd.com
- [Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) (#33993)
Signed-off-by: xuebwang-amd xuebwang@amd.com
- [Fix] Fix
logprobs=0handling for/inference/v1/generateendpoint (#34010)
Signed-off-by: SumanthRH sumanthrh99@gmail.com
- Fix RoutingMethodType logic (#33919)
Signed-off-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Signed-off-by: mgoin mgoin64@gmail.com Co-authored-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Co-authored-by: mgoin mgoin64@gmail.com
- [bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941)
Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com
- [Feat][RL] Pause and Resume with keep requests for single engine (#32351)
Signed-off-by: ahao-anyscale ahao@anyscale.com Signed-off-by: Aaron Hao ahao@anyscale.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967)
Signed-off-by: Ikenna ikennachifo@gmail.com Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com
- [Bugfix] Fix Whisper tokenization (#34011)
Signed-off-by: NickLucche nlucches@redhat.com
- [CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion (#34007)
Signed-off-by: Randall Smith Randall.Smith@amd.com
- [Bugfix] Fix _fused_moe_lora_expand signature mismatch (#33821)
Signed-off-by: Xin Yang xyangx@amazon.com
- [Misc] Add backward-compatible import aliases for renamed translations module (#34015)
Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com Co-authored-by: Cursor cursoragent@cursor.com
- [ModelRunner V2] Revert token rank comparison difference for now (#34017)
Signed-off-by: Nick Hill nickhill123@gmail.com
fix description in plugin_system.md (#33999)
[Revert] Add util
handle_deprecatedback (#33998)
Signed-off-by: yewentao256 zhyanwentao@126.com
- [Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support (#33517)
Signed-off-by: code4me2 velvetmoon222999@gmail.com
- [Misc] Make
PlaceholderRange.get_num_embedsa method (#34035)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug (#34038)
Signed-off-by: Andreas Karatzas akaratza@amd.com
Fix spelling errors (#33978)
[Misc] Simplify
get_max_tokens(#34036)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CI][Build] Pin grpcio-tools==1.78.0 (#34048)
Signed-off-by: wang.yuqi noooop@126.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- [Renderer] Define
render_cmplandrender_chat(#34039)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006)
Signed-off-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com Signed-off-by: Luka Govedič ProExpertProg@users.noreply.github.com Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com
- [torch.compile] Stop compiling identical artifacts (#34003)
Signed-off-by: Richard Zou zou3519@gmail.com
- Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 (#33939)
Signed-off-by: Akintunde Oladipo akintunde.oladipo@servicenow.com Signed-off-by: TundeAtSN akintunde.oladipo@servicenow.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [Frontend]Add support for transcriptions and translations to run_batch (#33934)
Signed-off-by: Pooya Davoodi pooya.davoodi@parasail.io Signed-off-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- [Model] Enable Step3p5ForCausalLM testing (#33755)
Signed-off-by: Jee Jee Li pandaleefree@gmail.com
- [PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660)
Signed-off-by: whx-sjtu 2952154980@qq.com
- move checks out of
unified_kv_cache_updatecustom op (#33943)
Signed-off-by: Rohan138 rohanpotdar138@gmail.com
- Update DeepGEMM version pin in Dockerfile to match #32479 (#33935)
Signed-off-by: Zifei Tong zifeitong@gmail.com Signed-off-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com
- Make directory exist ok for ray spinning up multiple replicas on a single instance (#33604)
Signed-off-by: Jiang Wu jwu@cclgroup.com
- Perf tuning and expansion of cases covered for wvSplitKrc (#33493)
Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com
- [Doc] Fix run_batch docs (#34056)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CI/Build] Skip GCS test (#34057)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [ROCm][Bugfix] fix act_quant_fusion module import error (#34069)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used (#33855)
Signed-off-by: Nick Hill nickhill123@gmail.com
- [ROCm] [CI] Reduce Resource of two test groups (#34059)
Signed-off-by: tjtanaa tunjian.tan@embeddedllm.com
- Add embedding input functionality for disabled modalities [remake] (#32493)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Signed-off-by: Reagan Lee reaganjlee@gmail.com Signed-off-by: Reagan Lee 96998476+reaganjlee@users.noreply.github.com Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate (#33771)
Signed-off-by: aabbccddwasd aabbccddwasd@qq.com
- [BugFix] Change support no act and mul for marlin (#34088)
Signed-off-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com Co-authored-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com
- [torch.compile] Add an option to force-enable the MOE cold start optimization (#33735)
Signed-off-by: Richard Zou zou3519@gmail.com
glm 4.6 fused tuned inference config for B200 (#32958)
Add support for ModelOpt MXFP8 dense models (#33786)
Signed-off-by: Daniel Serebrenik daserebrenik@nvidia.com
[Release 2.10] Update to Torch 2.10 - final release (#30525)
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state (#34027)
Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com Signed-off-by: kourosh hakhamaneshi 31483498+kouroshHakha@users.noreply.github.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- [Tiny] Rename encoder budget file to more specific name (#34103)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
- [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi yuqi.wang@daocloud.io
- [BugFix] Fix
fastsafetensorsTP all procs using all GPUs (#34070)
Signed-off-by: Nick Hill nickhill123@gmail.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk
- fix(cpu): fix mla_decode compilation on x86 without AVX512 (#34052)
Signed-off-by: ihb2032 hebome@foxmail.com Co-authored-by: root root@LAPTOP-FKNHV411.localdomain
[Model] GLM adaptation (#34124)
[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr (#34107)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [ASR] Fix audio benchmark and add RTFx metric (#32300)
Signed-off-by: Ekagra Ranjan 3116519+ekagra-ranjan@users.noreply.github.com Co-authored-by: Nicolò Lucchesi nicolo.lucchesi@gmail.com
- [Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul (#33901)
Signed-off-by: nikhil-arm nikhil.gupta2@arm.com
- [XPU][6/N] add xpu scaled_mm kernel (#34117)
Signed-off-by: Zhu, Zufang zufang.zhu@intel.com
- [MODEL] Adding Support for Qwen3.5 Models (#34110)
Signed-off-by: JJJYmmm 1650675829@qq.com Signed-off-by: JJJYmmm 92386084+JJJYmmm@users.noreply.github.com Signed-off-by: Roger Wang hey@rogerw.io Co-authored-by: wulipc wulipc@users.noreply.github.com Co-authored-by: ywang96 ywang96@users.noreply.github.com Co-authored-by: Isotr0py Isotr0py@users.noreply.github.com Co-authored-by: Isotr0py 2037008807@qq.com Co-authored-by: Roger Wang hey@rogerw.io
- [Misc] Fix up attention benchmarks (#33810)
Signed-off-by: Lucas Wilkinson lwilkins@redhat.com Signed-off-by: Matthew Bonanni mbonanni@redhat.com Co-authored-by: Matthew Bonanni mbonanni@redhat.com
- [UX] Add
--language-model-onlyfor hybrid models (#34120)
Signed-off-by: Roger Wang hey@rogerw.io
- [CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 (#34031)
Signed-off-by: Luka Govedič lgovedic@redhat.com
- Add NUMA Core binding in nixl_connector for CPU xPyD (#32365)
Signed-off-by: Hongming Zheng hongming.zheng@intel.com Signed-off-by: ZhengHongming888 hongming.zheng@intel.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [Kernel] FlashInfer: switch allreduce fusion to unified API (#33985)
Signed-off-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com
- [Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) (#34087)
Signed-off-by: Tomer Natan tbarnatan@nvidia.com Co-authored-by: Cursor cursoragent@cursor.com
- [Kernel] use flashinfer for gdn prefill (#32846)
Signed-off-by: zjy0516 riverclouds.zhu@qq.com
- [Bugfix] Avoid duplicate k-proj weight emission in helper (#34142)
Signed-off-by: Artus KG artuskg@gmail.com
- [Bugfix] Voxtral prompt/audio placeholder alignment (#34140)
Signed-off-by: Artus KG artuskg@gmail.com
- [ROCm] update triton branch to support gpt-oss models for gfx11xx devices (#34032)
Signed-off-by: Hongxia Yang hongxia.yang@amd.com
- [torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945)
Signed-off-by: charlifu charlifu@amd.com
- [ModelRunner V2][BugFix] Fix
max_query_lencalculation (#34167)
Signed-off-by: Nick Hill nickhill123@gmail.com
[Doc] Add DCP support to attention backend doc (#33936)
[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153)
Signed-off-by: Gregory Shtrasberg Gregory.Shtrasberg@amd.com
- [structured output] validate unsupported json features first (#33233)
Signed-off-by: Andy Xie andy.xning@gmail.com Co-authored-by: Chauncey chaunceyjiang@gmail.com Co-authored-by: Russell Bryant rbryant@redhat.com
- [LMCache] Token Base IPC API (#34175)
Signed-off-by: Oasis-Git ayw.sirius19@gmail.com
- [Bugfix] Adopt
ChunkGatedDeltaRulefor Qwen3.5 (#34198)
Signed-off-by: Roger Wang hey@rogerw.io
- [ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection (#34108)
Signed-off-by: Andreas Karatzas akaratza@amd.com
- [Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183)
Signed-off-by: Roger Wang hey@rogerw.io
- [Doc] Update usage of
--limit-mm-per-prompt(#34148)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [CI/Build] Relax
test_mcp_tool_call(#34204)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix] Fix DP Attention Padding in Dummy Run (#34187)
Signed-off-by: Benjamin Chislett bchislett@nvidia.com Signed-off-by: Lucas Wilkinson lwilkins@redhat.com Co-authored-by: Benjamin Chislett bchislett@nvidia.com
- [Bugfix] Add
--trust-remote-codeto dataset bench args (#34208)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [responsesAPI] fix simpleContext streaming output_messages (#34188)
Signed-off-by: Andrew Xia axia@meta.com Signed-off-by: Andrew Xia axia@fb.com Co-authored-by: Andrew Xia axia@fb.com
- [Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 (#34190)
Signed-off-by: Balaxxe 136368465+jaim12005@users.noreply.github.com
- [Frontend][CI] Consolidate instrumentator entrypoints (#34123)
Signed-off-by: wang.yuqi yuqi.wang@daocloud.io
- [BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387)
Signed-off-by: Chen Zhang zhangch99@outlook.com
- [Perf] Optimize detokenizer python logic (#32975)
Signed-off-by: yewentao256 zhyanwentao@126.com Signed-off-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Co-authored-by: Nick Hill nhill@redhat.com
Revert #34208 (#34216)
[Bugfix] Fix memory inconsistency in cross-process shared memory (#32022)
Signed-off-by: Zetong Li slippersss@126.com
- [Bugfix] Fix
--trust-remote-codeconflict (#34218)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Docs] Fix format error in KV load failure recovery doc (#34137)
Signed-off-by: Jaebok Lee jaebok9541@naver.com
- [Bugfix] Fix FI kernel
chunk_gated_delta_ruleoutput shape for Qwen3.5 (#34219)
Signed-off-by: Roger Wang hey@rogerw.io
- Add flagos in MiniCPM-o (#34126)
Signed-off-by: tc-mb caitianchi@modelbest.cn Signed-off-by: Vincent-Xiao vincent.xiao.me@gmail.com Co-authored-by: Vincent-Xiao vincent.xiao.me@gmail.com
[Misc] allow specify is_mm_prefix_lm in hf_config (#34215)
Stop testing for slow tokenizers as they will not exist soon (#34235)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220)
Signed-off-by: KrxGu krishom70@gmail.com
- Bump
mamba-ssmversion in CI for Transformers v5 compatibility (#34233)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- add --insecure arg to the vllm bench to skip TLS (#34026)
Signed-off-by: Fan Yang yan9fan@meta.com Co-authored-by: Fan Yang yan9fan@meta.com
- Support benchmarking of Geospatial models (#33922)
Signed-off-by: Michele Gazzetti michele.gazzetti1@ibm.com
- [ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008)
Signed-off-by: xuebwang-amd xuebwang@amd.com Signed-off-by: Robert Shaw robshaw@redhat.com Co-authored-by: Robert Shaw robshaw@redhat.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [compile] Enable AOT compile with 2.10 in trunk. (#34155)
Signed-off-by: Zhengxu Chen zhxchen17@meta.com
- [Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680)
Signed-off-by: LopezCastroRoberto rocastro@redhat.com Signed-off-by: Roberto L. Castro 38211239+LopezCastroRoberto@users.noreply.github.com Co-authored-by: Claude Sonnet 4.5 noreply@anthropic.com
- [Core][BugFix] Fix PP KV cache sharding memory validation (#33698)
Signed-off-by: junuxyz 216036880+junuxyz@users.noreply.github.com
- [BUGFIX] Fix accuracy bugs in Qwen3-Next MTP (#34077)
Signed-off-by: Vadim Gimpelson vadim.gimpelson@gmail.com
- [Docs] Speed up build environment set-up (#34240)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Model Runner V2] Use pinned memory for write_contents (#34222)
Signed-off-by: Woosuk Kwon woosuk@inferact.ai
- Minor cleanup for Voxtral (#34247)
Signed-off-by: Andy Lo andy@mistral.ai
- [UX nit] Fix non-default api_server_count message (#34152)
Signed-off-by: mgoin mgoin64@gmail.com
- [Misc] Introduce ec_both role EC (encoder cache) connector (#34182)
Signed-off-by: Qi Wang qiwa@nvidia.com
- Convert online APIs to use Renderer (#34084)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
- [Bugfix] Fix weights offloading for sleep mode (#32947)
Signed-off-by: Jarno Seppänen jseppanen@nvidia.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com
- [Benchmarks] Fix attention benchmark smoke test (#34269)
Signed-off-by: Matthew Bonanni mbonanni@redhat.com
- [Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200)
Signed-off-by: Roger Wang hey@rogerw.io
- [SM100] Resubmit FMHA FP8 prefill for MLA (#31195)
Signed-off-by: Pavani Majety pmajety@nvidia.com
- [Feature] Warn about unrecognized environment variables (#33581)
Signed-off-by: Gregory Shtrasberg Gregory.Shtrasberg@amd.com
- [Perf] Move eplb rebalance algo to async thread (#30888)
Signed-off-by: ilmarkov markovilya197@gmail.com Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com Co-authored-by: Tyler Michael Smith tlrmchlsmth@gmail.com
- [BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860)
Signed-off-by: ilmarkov markovilya197@gmail.com
- [Misc][Spec Decode] support different load config for draft model (#34022)
Signed-off-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com Co-authored-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com
- [torch.compile] Disable recursive pre_grad_passes (#34092)
Signed-off-by: Richard Zou zou3519@gmail.com
- [Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271)
Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
- [CI] Add pip caching to cleanup_pr_body workflow (#32979)
Signed-off-by: 7. Sun jhao.sun@gmail.com
- [MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344)
Signed-off-by: Bill Nell bnell@redhat.com
- [ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280)
Signed-off-by: Micah Williamson micah.williamson@amd.com
- [Misc] Add run one batch script that supports profiling (#32968)
Signed-off-by: Lucas Wilkinson lwilkins@redhat.com
- [Bugfix] Fix Worker.load_model context-manager composition for sleep mode (#34021)
Signed-off-by: tianshu.yu tianshuyu.formal@gmail.com
- [Redo] Add
--trust-remote-codeto dataset bench args (#34251)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093)
Signed-off-by: Richard Zou zou3519@gmail.com
- [WideEP] Fix nvfp4 DeepEP High Throughput All2All backend (#33738)
Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com
- [Misc] Clean up validation logic in input processor (#34144)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast (#33884)
Signed-off-by: Kebe mail@kebe7jun.com
- [Kernel] Apply 256bit LDG/STG To Activation Kernels (#33022)
Signed-off-by: Dzerzhinsky 256908701+AstroVoyager7@users.noreply.github.com Signed-off-by: Дзержи́нский 256908701+AstroVoyager7@users.noreply.github.com Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com
- [XPU][7/N] enable xpu fp8 moe (#34202)
Signed-off-by: Zhu, Zufang zufang.zhu@intel.com
- [Plugin] Simplify IO Processor Plugin interface (#34236)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149)
Signed-off-by: Matthias Gehre matthias.gehre@amd.com
- Threshold fix wvSplitk for occasional CI fails (#34013)
Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com
- [ModelBash][DSR1 NVFp4] Removed Bf16 Bias Cast (#34298)
Signed-off-by: Robert Shaw robshaw@redhat.com Co-authored-by: Robert Shaw robshaw@redhat.com
- [Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides (#34279)
Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
- [Bugfix] Fix weight naming in Qwen3.5 (#34313)
Signed-off-by: Roger Wang hey@rogerw.io
- [CPU] Enable FP16 (Half dtype) support for s390x (#34116)
Signed-off-by: Rehan Khan Rehan.Khan7@ibm.com
- [model] support FunASR model (#33247)
Signed-off-by: zixiao shunli.dsl@alibaba-inc.com Co-authored-by: zixiao shunli.dsl@alibaba-inc.com
- [XPU][9/N] clean up existing ipex code/doc (#34111)
Signed-off-by: Kunshang Ji kunshang.ji@intel.com
- [Chore] Move
BaseRenderertobase.py(#34308)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [torch.compile] Enable AR+rms fusion by default available for
-O2(#34299)
Signed-off-by: Luka Govedič lgovedic@redhat.com
- [Misc] Bump
fastsafetensorsversion for latest fixes (#34273)
Signed-off-by: Nick Hill nickhill123@gmail.com
- [Doc] Update Marlin support matrix for Turing (#34319)
Signed-off-by: Tianqi Ren tianqi.r@outlook.com
- [Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217)
Signed-off-by: Nick Hill nickhill123@gmail.com
- Patch protobuf for CVE-2026-0994 (#34253)
Signed-off-by: Seiji Eicher seiji@anyscale.com Co-authored-by: Kevin H. Luu khluu000@gmail.com
- [Docs] Reduce time spent generating API docs (#34255)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Bugfix][CPU] Fix llama4 inference on CPU (#34321)
Signed-off-by: jiang1.li jiang1.li@intel.com
- Make Qwen3VL compatible with Transformers v5 (#34262)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Signed-off-by: Roger Wang hey@rogerw.io Co-authored-by: Roger Wang hey@rogerw.io
- Make JAIS compatible with Transformers v5 (#34264)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715)
Signed-off-by: Linda-Stadter 57756729+Linda-Stadter@users.noreply.github.com Co-authored-by: Pavani Majety pmajety@nvidia.com
- Responses harmony system message structured (#34268)
Signed-off-by: Adam Binford adamq43@gmail.com
- Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson lwilkins@redhat.com
- Don't try and run GLM-ASR with remote code (#34352)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948)
Signed-off-by: Rohan138 rohanpotdar138@gmail.com
- [ROCm] [aiter] Split KV cache update for AiterFlashAttention (#33681)
Signed-off-by: kliuae kuanfu.liu@embeddedllm.com
- [Docs] Fix typo ("defult") and double spacing (#34348)
Signed-off-by: SorenDreano 71752785+SorenDreano@users.noreply.github.com Co-authored-by: Soren Dreano soren@numind.ai Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- [CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… (#32458)
Signed-off-by: junuxyz 216036880+junuxyz@users.noreply.github.com
- [Model Runner V2] Init cuda graph pool when necessary (#33217)
Signed-off-by: Xinyu Chen xinyu1.chen@intel.com
- [Multimodal] Expose
mm_processor_kwargsforDummyInputsBuilder(#34330)
Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn
- [Bugfix] fix default is_neox_style is True for deepseek (#34353)
Signed-off-by: dongxinyu03 dongxinyu03@baidu.com
- [Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) (#34243)
Signed-off-by: Your Name you@example.com Co-authored-by: Your Name you@example.com
- [ROCm] [CI] fix test_unrecognized_env (#34350)
Signed-off-by: tjtanaa tunjian.tan@embeddedllm.com
- [GPT-OSS] Remove unnecessary contiguous (#34337)
Signed-off-by: elvischenv 219235043+elvischenv@users.noreply.github.com
- Add cartridge (prefix) benchmark configs to CI workflows
The prefix_latency and prefix_throughput configs existed but weren't being run by any workflow. Each benchmark workflow now runs both the base and cartridge configs using the shared server support.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
update flashinfer
update wheel
update cuda and flashinfer
downgrade
update tests
experimental: implement pipelining
add pipeline test
configure PR to actually run
bugfix
loosen TPOT threshold for catridge latency
improve pipelining
simplify pipelining impl
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com Signed-off-by: Kevin H. Luu khluu000@gmail.com Signed-off-by: Andreas Karatzas akaratza@amd.com Signed-off-by: Nick Hill nickhill123@gmail.com Signed-off-by: rinbaro ilgomishra@gmail.com Signed-off-by: Randall Smith Randall.Smith@amd.com Signed-off-by: Andrew Xia axia@fb.com Signed-off-by: Andrew Xia axia@meta.com Signed-off-by: jiang1.li jiang1.li@intel.com Signed-off-by: wang.yuqi yuqi.wang@daocloud.io Signed-off-by: Fadi Arafeh fadi.arafeh@arm.com Signed-off-by: Mark McLoughlin markmc@redhat.com Signed-off-by: Pavani Majety pmajety@nvidia.com Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk Signed-off-by: Matthew Wong Matthew.Wong2@amd.com Signed-off-by: Liran Schour lirans@il.ibm.com Signed-off-by: liranschour liranschour@users.noreply.github.com Signed-off-by: Isotr0py mozf@mail2.sysu.edu.cn Signed-off-by: NickLucche nlucches@redhat.com Signed-off-by: mariohong mariohong128@gmail.com Signed-off-by: ahao-anyscale ahao@anyscale.com Signed-off-by: Aaron Hao ahao@anyscale.com Signed-off-by: Daniel Serebrenik daserebrenik@nvidia.com Signed-off-by: Benjamin Chislett bchislett@nvidia.com Signed-off-by: Yoray Zack yorayz@nvidia.com Signed-off-by: Bill Nell bnell@redhat.com Signed-off-by: Tsukasa OI floss_llm@irq.a4lg.com Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Signed-off-by: Matthew Bonanni mbonanni@redhat.com Signed-off-by: Lihao Ran imlihao.ran@gmail.com Signed-off-by: mgoin mgoin64@gmail.com Signed-off-by: Hashem Hashemi hashem.hashemi@amd.com Signed-off-by: wzhao18 wzhao18.sz@gmail.com Signed-off-by: simon-mo simon.mo@hey.com Signed-off-by: rabi ramishra@redhat.com Signed-off-by: limingliang limingliang@stepfun.com Signed-off-by: Rehan Khan Rehan.Khan7@ibm.com Signed-off-by: Kunshang Ji kunshang.ji@intel.com Signed-off-by: sihao.li sihao.li@intel.com Signed-off-by: Chengcheng Pei chengchengpei@outlook.com Signed-off-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Signed-off-by: Gassan gassan.salama@arm.com Signed-off-by: Xinyu Chen xinyu1.chen@intel.com Signed-off-by: zhangyue66 zhangyue66@baidu.com Signed-off-by: Luka Govedič lgovedic@redhat.com Signed-off-by: ProExpertProg luka.govedic@gmail.com Signed-off-by: Luka Govedič ProExpertProg@users.noreply.github.com Signed-off-by: kurt kurt@thinkingmachines.ai Signed-off-by: raushan raushan@huggingface.co Signed-off-by: Raushan Turganbay raushan.turganbay@alumni.nu.edu.kz Signed-off-by: Frederic Odermatt frederic.odermatt@44ai.ch Signed-off-by: caitianchi caitianchi@modelbest.cn Signed-off-by: tc-mb caitianchi@modelbest.cn Signed-off-by: Zhu, Zufang zufang.zhu@intel.com Signed-off-by: Eldar Kurtić 8884008+eldarkurtic@users.noreply.github.com Signed-off-by: yewentao256 zhyanwentao@126.com Signed-off-by: Seiji Eicher seiji@anyscale.com Signed-off-by: vllmellm vllm.ellm@embeddedllm.com Signed-off-by: zhuhaoran zhuhaoran.zhr@alibaba-inc.com Signed-off-by: charlifu charlifu@amd.com Signed-off-by: xuebwang-amd xuebwang@amd.com Signed-off-by: SumanthRH sumanthrh99@gmail.com Signed-off-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Signed-off-by: Kourosh Hakhamaneshi kourosh@anyscale.com Signed-off-by: Ikenna ikennachifo@gmail.com Signed-off-by: Xin Yang xyangx@amazon.com Signed-off-by: code4me2 velvetmoon222999@gmail.com Signed-off-by: wang.yuqi noooop@126.com Signed-off-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com Signed-off-by: Richard Zou zou3519@gmail.com Signed-off-by: Akintunde Oladipo akintunde.oladipo@servicenow.com Signed-off-by: TundeAtSN akintunde.oladipo@servicenow.com Signed-off-by: Pooya Davoodi pooya.davoodi@parasail.io Signed-off-by: Cyrus Leung cyrus.tl.leung@gmail.com Signed-off-by: Jee Jee Li pandaleefree@gmail.com Signed-off-by: whx-sjtu 2952154980@qq.com Signed-off-by: Rohan138 rohanpotdar138@gmail.com Signed-off-by: Zifei Tong zifeitong@gmail.com Signed-off-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Signed-off-by: Jiang Wu jwu@cclgroup.com Signed-off-by: tjtanaa tunjian.tan@embeddedllm.com Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Signed-off-by: Reagan Lee reaganjlee@gmail.com Signed-off-by: Reagan Lee 96998476+reaganjlee@users.noreply.github.com Signed-off-by: aabbccddwasd aabbccddwasd@qq.com Signed-off-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com Signed-off-by: kourosh hakhamaneshi 31483498+kouroshHakha@users.noreply.github.com Signed-off-by: ihb2032 hebome@foxmail.com Signed-off-by: Ekagra Ranjan 3116519+ekagra-ranjan@users.noreply.github.com Signed-off-by: nikhil-arm nikhil.gupta2@arm.com Signed-off-by: JJJYmmm 1650675829@qq.com Signed-off-by: JJJYmmm 92386084+JJJYmmm@users.noreply.github.com Signed-off-by: Roger Wang hey@rogerw.io Signed-off-by: Lucas Wilkinson lwilkins@redhat.com Signed-off-by: Hongming Zheng hongming.zheng@intel.com Signed-off-by: ZhengHongming888 hongming.zheng@intel.com Signed-off-by: Tomer Natan tbarnatan@nvidia.com Signed-off-by: zjy0516 riverclouds.zhu@qq.com Signed-off-by: Artus KG artuskg@gmail.com Signed-off-by: Hongxia Yang hongxia.yang@amd.com Signed-off-by: Gregory Shtrasberg Gregory.Shtrasberg@amd.com Signed-off-by: Andy Xie andy.xning@gmail.com Signed-off-by: Oasis-Git ayw.sirius19@gmail.com Signed-off-by: Balaxxe 136368465+jaim12005@users.noreply.github.com Signed-off-by: Chen Zhang zhangch99@outlook.com Signed-off-by: Zetong Li slippersss@126.com Signed-off-by: Jaebok Lee jaebok9541@naver.com Signed-off-by: Vincent-Xiao vincent.xiao.me@gmail.com Signed-off-by: KrxGu krishom70@gmail.com Signed-off-by: Fan Yang yan9fan@meta.com Signed-off-by: Michele Gazzetti michele.gazzetti1@ibm.com Signed-off-by: Robert Shaw robshaw@redhat.com Signed-off-by: Zhengxu Chen zhxchen17@meta.com Signed-off-by: LopezCastroRoberto rocastro@redhat.com Signed-off-by: Roberto L. Castro 38211239+LopezCastroRoberto@users.noreply.github.com Signed-off-by: junuxyz 216036880+junuxyz@users.noreply.github.com Signed-off-by: Vadim Gimpelson vadim.gimpelson@gmail.com Signed-off-by: Woosuk Kwon woosuk@inferact.ai Signed-off-by: Andy Lo andy@mistral.ai Signed-off-by: Qi Wang qiwa@nvidia.com Signed-off-by: Jarno Seppänen jseppanen@nvidia.com Signed-off-by: ilmarkov markovilya197@gmail.com Signed-off-by: Tyler Michael Smith tlrmchlsmth@gmail.com Signed-off-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com Signed-off-by: 7. Sun jhao.sun@gmail.com Signed-off-by: Micah Williamson micah.williamson@amd.com Signed-off-by: tianshu.yu tianshuyu.formal@gmail.com Signed-off-by: Kebe mail@kebe7jun.com Signed-off-by: Dzerzhinsky 256908701+AstroVoyager7@users.noreply.github.com Signed-off-by: Дзержи́нский 256908701+AstroVoyager7@users.noreply.github.com Signed-off-by: Matthias Gehre matthias.gehre@amd.com Signed-off-by: zixiao shunli.dsl@alibaba-inc.com Signed-off-by: Tianqi Ren tianqi.r@outlook.com Signed-off-by: Linda-Stadter 57756729+Linda-Stadter@users.noreply.github.com Signed-off-by: Adam Binford adamq43@gmail.com Signed-off-by: kliuae kuanfu.liu@embeddedllm.com Signed-off-by: SorenDreano 71752785+SorenDreano@users.noreply.github.com Signed-off-by: dongxinyu03 dongxinyu03@baidu.com Signed-off-by: Your Name you@example.com Signed-off-by: elvischenv 219235043+elvischenv@users.noreply.github.com Co-authored-by: Chauncey chaunceyjiang@gmail.com Co-authored-by: Kevin H. Luu khluu000@gmail.com Co-authored-by: Andreas Karatzas akaratza@amd.com Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com Co-authored-by: Nick Hill nhill@redhat.com Co-authored-by: rinbaro ilgomishra@gmail.com Co-authored-by: rasmith Randall.Smith@amd.com Co-authored-by: Andrew Xia axia@meta.com Co-authored-by: Andrew Xia axia@fb.com Co-authored-by: Li, Jiang jiang1.li@intel.com Co-authored-by: wang.yuqi yuqi.wang@daocloud.io Co-authored-by: Fadi Arafeh 115173828+fadara01@users.noreply.github.com Co-authored-by: Mark McLoughlin markmc@redhat.com Co-authored-by: Pavani Majety pmajety@nvidia.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk Co-authored-by: Matthew Wong Matthew.Wong2@amd.com Co-authored-by: liranschour liranschour@users.noreply.github.com Co-authored-by: Or Ozeri or@ozery.com Co-authored-by: Nicolò Lucchesi nicolo.lucchesi@gmail.com Co-authored-by: Nicolò Lucchesi nlucches@redhat.com Co-authored-by: jiahanc 173873397+jiahanc@users.noreply.github.com Co-authored-by: Isotr0py mozf@mail2.sysu.edu.cn Co-authored-by: Mario Hong 86880754+mariohong128@users.noreply.github.com Co-authored-by: Aaron Hao ahao@anyscale.com Co-authored-by: SumanthRH sumanthrh99@gmail.com Co-authored-by: danisereb daserebrenik@nvidia.com Co-authored-by: Benjamin Chislett bchislett@nvidia.com Co-authored-by: zackyoray yorayz@nvidia.com Co-authored-by: bnellnm 49004751+bnellnm@users.noreply.github.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-redhat@users.noreply.github.com Co-authored-by: Tsukasa OI floss_llm@irq.a4lg.com Co-authored-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Co-authored-by: Matthew Bonanni mbonanni@redhat.com Co-authored-by: Lumosis 30372757+Lumosis@users.noreply.github.com Co-authored-by: mgoin mgoin64@gmail.com Co-authored-by: Hashem Hashemi 159079214+amd-hhashemi@users.noreply.github.com Co-authored-by: Wei Zhao 51183510+wzhao18@users.noreply.github.com Co-authored-by: emricksini-h emrick.birivoutin@hcompany.ai Co-authored-by: Xin Yang 105740670+xyang16@users.noreply.github.com Co-authored-by: Simon Mo simon.mo@hey.com Co-authored-by: Rabi Mishra ramishra@redhat.com Co-authored-by: Mingliang Li limingliang0527@gmail.com Co-authored-by: limingliang limingliang@stepfun.com Co-authored-by: R3hankhan Rehan.Khan7@ibm.com Co-authored-by: Kunshang Ji kunshang.ji@intel.com Co-authored-by: sihao_li 165983188+1643661061leo@users.noreply.github.com Co-authored-by: chengchengpei 5881383+chengchengpei@users.noreply.github.com Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Gassan Salama gassan.salama@arm.com Co-authored-by: Xinyu Chen xinyu1.chen@intel.com Co-authored-by: zhang-prog 69562787+zhang-prog@users.noreply.github.com Co-authored-by: Kurt Shuster shuster.kurt@gmail.com Co-authored-by: SorenDreano 71752785+SorenDreano@users.noreply.github.com Co-authored-by: Soren Dreano soren@numind.ai Co-authored-by: Wentao Ye 44945378+yewentao256@users.noreply.github.com Co-authored-by: Raushan Turganbay raushan@huggingface.co Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: FredericOdermatt 50372080+FredericOdermatt@users.noreply.github.com Co-authored-by: tc-mb 157115220+tc-mb@users.noreply.github.com Co-authored-by: mslv mslv@baai.ac.cn Co-authored-by: zofia 110436990+zufangzhu@users.noreply.github.com Co-authored-by: Eldar Kurtić 8884008+eldarkurtic@users.noreply.github.com Co-authored-by: Seiji Eicher 58963096+eicherseiji@users.noreply.github.com Co-authored-by: vllmellm vllm.ellm@embeddedllm.com Co-authored-by: TJian tunjian.tan@embeddedllm.com Co-authored-by: zhrrr 43847754+izhuhaoran@users.noreply.github.com Co-authored-by: Charlie Fu charlifu@amd.com Co-authored-by: xuebwang-amd xuebwang@amd.com Co-authored-by: Sumanth R Hegde 39546518+SumanthRH@users.noreply.github.com Co-authored-by: Dimitrios Bariamis dbari@users.noreply.github.com Co-authored-by: Dimitrios Bariamis 12195802+dbari@users.noreply.github.com Co-authored-by: kourosh hakhamaneshi 31483498+kouroshHakha@users.noreply.github.com Co-authored-by: Ikenna ikennachifo@gmail.com Co-authored-by: Cursor cursoragent@cursor.com Co-authored-by: 果冻虾仁 guodong@apache.org Co-authored-by: Vel 110626982+Code4me2@users.noreply.github.com Co-authored-by: lukec 118525388+sleepcoo@users.noreply.github.com Co-authored-by: Mohammad Miadh Angkad 176301910+mmangkad@users.noreply.github.com Co-authored-by: Richard Zou zou3519@users.noreply.github.com Co-authored-by: TundeAtSN akintunde.oladipo@servicenow.com Co-authored-by: Pooya Davoodi pooya.davoodi@parasail.io Co-authored-by: Jee Jee Li pandaleefree@gmail.com Co-authored-by: whx 56632993+whx-sjtu@users.noreply.github.com Co-authored-by: Rohan Potdar 66227218+Rohan138@users.noreply.github.com Co-authored-by: zifeitong zifeitong@gmail.com Co-authored-by: Jiang Wu jwu@cclgroup.com Co-authored-by: Reagan Lee 96998476+reaganjlee@users.noreply.github.com Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: aabbccddwasd 140953076+aabbccddwasd@users.noreply.github.com Co-authored-by: TomerBN-Nvidia tbarnatan@nvidia.com Co-authored-by: Tomer Natan tbarnatan@computelab-frontend-8.nvidia.com Co-authored-by: navmarri14 nmarri@roblox.com Co-authored-by: Andrey Talman atalman@fb.com Co-authored-by: ihb2032 40718643+ihb2032@users.noreply.github.com Co-authored-by: root root@LAPTOP-FKNHV411.localdomain Co-authored-by: Ekagra Ranjan 3116519+ekagra-ranjan@users.noreply.github.com Co-authored-by: Nikhil Gupta nikhil.gupta2@arm.com Co-authored-by: JJJYmmm 92386084+JJJYmmm@users.noreply.github.com Co-authored-by: wulipc wulipc@users.noreply.github.com Co-authored-by: ywang96 ywang96@users.noreply.github.com Co-authored-by: Isotr0py Isotr0py@users.noreply.github.com Co-authored-by: Isotr0py 2037008807@qq.com Co-authored-by: Roger Wang hey@rogerw.io Co-authored-by: Lucas Wilkinson LucasWilkinson@users.noreply.github.com Co-authored-by: ZhengHongming888 hongming.zheng@intel.com Co-authored-by: Jiangyun Zhu riverclouds.zhu@qq.com Co-authored-by: Artus Krohn-Grimberghe artuskg@users.noreply.github.com Co-authored-by: Hongxia Yang 62075498+hongxiayang@users.noreply.github.com Co-authored-by: Gregory Shtrasberg 156009573+gshtras@users.noreply.github.com Co-authored-by: Ning Xie andy.xning@gmail.com Co-authored-by: Russell Bryant rbryant@redhat.com Co-authored-by: Yuwei An ayw.sirius19@gmail.com Co-authored-by: Balaxxe 136368465+jaim12005@users.noreply.github.com Co-authored-by: Chen Zhang zhangch99@outlook.com Co-authored-by: Zetong Li 48438720+slippersss@users.noreply.github.com Co-authored-by: zzaebok 44357534+zzaebok@users.noreply.github.com Co-authored-by: Vincent-Xiao vincent.xiao.me@gmail.com Co-authored-by: Phúc H. Lê Khắc lkhphuc@pm.me Co-authored-by: Krish Gupta krishom70@gmail.com Co-authored-by: Fan Yang fanyang.real@gmail.com Co-authored-by: Fan Yang yan9fan@meta.com Co-authored-by: mgazz michele.gazzetti1@ibm.com Co-authored-by: Robert Shaw robshaw@redhat.com Co-authored-by: Zhengxu Chen zhxchen17@meta.com Co-authored-by: Roberto L. Castro 38211239+LopezCastroRoberto@users.noreply.github.com Co-authored-by: Claude Sonnet 4.5 noreply@anthropic.com Co-authored-by: junuxyz 216036880+junuxyz@users.noreply.github.com Co-authored-by: Vadim Gimpelson 156319763+vadiklyutiy@users.noreply.github.com Co-authored-by: Woosuk Kwon woosuk.kwon@berkeley.edu Co-authored-by: Andy Lo andy@mistral.ai Co-authored-by: Qi Wang wqstu1@gmail.com Co-authored-by: J Seppänen 83203+jseppanen@users.noreply.github.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com Co-authored-by: Ilya Markov markovilya197@gmail.com Co-authored-by: Tyler Michael Smith tlrmchlsmth@gmail.com Co-authored-by: Zhengkai Zhang 33679250+ZhengkaiZ@users.noreply.github.com Co-authored-by: zzhengkai zzhengkai@devgpu049.ldc1.facebook.com Co-authored-by: 7. Sun jhao.sun@gmail.com Co-authored-by: Micah Williamson micah.williamson@amd.com Co-authored-by: tianshu-Michael-yu 101950379+tianshu-Michael-yu@users.noreply.github.com Co-authored-by: Kebe mail@kebe7jun.com Co-authored-by: Дзержи́нский 256908701+AstroVoyager7@users.noreply.github.com Co-authored-by: Matthias Gehre matthias.gehre@amd.com Co-authored-by: AllenDou allen.dou@hotmail.com Co-authored-by: zixiao shunli.dsl@alibaba-inc.com Co-authored-by: Tianqi Ren tianqi.r@outlook.com Co-authored-by: Linda 57756729+Linda-Stadter@users.noreply.github.com Co-authored-by: Adam Binford adamq43@gmail.com Co-authored-by: kliuae 17350011+kliuae@users.noreply.github.com Co-authored-by: Xinyu Dong dongxinyu03@baidu.com Co-authored-by: Your Name you@example.com Co-authored-by: elvischenv 219235043+elvischenv@users.noreply.github.com
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})