Use @property and private field for data_parallel_rank_local by hmellor · Pull Request #17053 · vllm-project/vllm (original) (raw)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
hmellor deleted the ensure-dp-rank-local-is-int branch
gshtras added a commit to ROCm/vllm that referenced this pull request
- [BugFix] Remove default multiproc executor
collective_rpctimeout (vllm-project#17000)
Signed-off-by: Nick Hill nhill@redhat.com
- [Core][V1][TPU] Enable structured decoding on TPU V1 (vllm-project#16499)
Signed-off-by: Chenyaaang chenyangli@google.com
- [Bugfix] validate urls object for multimodal content parts (vllm-project#16990)
Signed-off-by: Guillaume Calmettes gcalmettes@scaleway.com
- add Dockerfile build vllm against torch nightly (vllm-project#16936)
Signed-off-by: Yang Wang elainewy@meta.com
- [Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (vllm-project#13305)
Signed-off-by: Sage Moore sage@neuralmagic.com Signed-off-by: root root@banff-cyxtera-s73-5.ctr.dcgpu Signed-off-by: Aleksandr Malyshev maleksan@amd.com Signed-off-by: root root@banff-cyxtera-s65-4.amd.com Signed-off-by: maleksan85 maleksan@amd.com Signed-off-by: <> Co-authored-by: Sage Moore sage@neuralmagic.com Co-authored-by: root root@banff-cyxtera-s73-5.ctr.dcgpu Co-authored-by: Aleksandr Malyshev maleksan@amd.com Co-authored-by: qli88 qiang.li2@amd.com Co-authored-by: root root@banff-cyxtera-s65-4.amd.com
- [V1][DP] More robust DP/EP dummy request coordination (vllm-project#16277)
Signed-off-by: Nick Hill nhill@redhat.com
- [BugFix] Revert ROCm Custom Paged Attention Env Flag Check (vllm-project#17022)
Signed-off-by: vllmellm vllm.ellm@embeddedllm.com
Revert "[Misc] Add S3 environment variables for better support of MinIO." (vllm-project#17021)
[misc] tune some env vars for GB200 (vllm-project#16992)
Signed-off-by: youkaichao youkaichao@gmail.com
- [INTEL-HPU][v0] Port delayed sampling to upstream (vllm-project#16949)
Signed-off-by: Michal Adamczyk michal.adamczyk@intel.com Signed-off-by: Chendi Xue chendi.xue@intel.com Co-authored-by: Michal Adamczyk madamczyk@habana.ai
- [doc] add download path tips (vllm-project#17013)
Signed-off-by: reidliu41 reid201711@gmail.com Co-authored-by: reidliu41 reid201711@gmail.com
- [Bugfix] Triton FA function takes no keyword arguments (vllm-project#16902)
Signed-off-by: vllmellm vllm.ellm@embeddedllm.com
- [V1] Avoid socket errors during shutdown when requests are in in-flight (vllm-project#16807)
Signed-off-by: Nick Hill nhill@redhat.com
- [BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (vllm-project#16998)
Signed-off-by: Lucas Wilkinson lwilkinson@neuralmagic.com
- [Misc] Improve readability of get_open_port function. (vllm-project#17024)
Signed-off-by: gitover22 qidizou88@gmail.com
- [Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (vllm-project#16964)
Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com
- [CI] Run v1/test_serial_utils.py in CI (vllm-project#16996)
Signed-off-by: Russell Bryant rbryant@redhat.com
- Mistral-format support for compressed-tensors (vllm-project#16803)
Signed-off-by: mgoin mgoin64@gmail.com
- Categorize
tests/kernels/based on kernel type (vllm-project#16799)
Signed-off-by: mgoin mgoin64@gmail.com
- [Doc] Add top anchor and a note to quantization/bitblas.md (vllm-project#17042)
Signed-off-by: windsonsea haifeng.yao@daocloud.io
- Ensure that
pidpassed tokill_process_treeisintformypy(vllm-project#17051)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [CI] Update structured-output label automation (vllm-project#17055)
Signed-off-by: Russell Bryant rbryant@redhat.com
- Improve Transformers backend model loading QoL (vllm-project#17039)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
CacheConfig.block_sizeshould always beintwhen used (vllm-project#17052)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- Use
@propertyand private field fordata_parallel_rank_local(vllm-project#17053)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (vllm-project#15949)
Signed-off-by: Travis Johnson tsjohnso@us.ibm.com
[BugFix][V1] Fix int32 token index overflow when preparing input ids (vllm-project#16806)
[V1][Spec Decode] Always use argmax for sampling draft tokens (vllm-project#16899)
Signed-off-by: Woosuk Kwon woosuk.kwon@berkeley.edu
- [CI/Build] workaround for CI build failure (vllm-project#17070)
Signed-off-by: csy1204 josang1204@gmail.com Co-authored-by: Michael Goin mgoin64@gmail.com
[Quantization]add prefix for commandA quantized model (vllm-project#17017)
[Minor] Use larger batch sizes for A100/B100/B200/MI300x (vllm-project#17073)
Signed-off-by: Woosuk Kwon woosuk.kwon@berkeley.edu
- [Bugfix] Enable V1 usage stats (vllm-project#16986)
Signed-off-by: mgoin mgoin64@gmail.com Signed-off-by: Nick Hill nhill@redhat.com Co-authored-by: Nick Hill nhill@redhat.com
- More informative error when using Transformers backend (vllm-project#16988)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- Addendum Fix to support FIPS enabled machines with MD5 hashing (vllm-project#17043)
Signed-off-by: sydarb areebsyed237@gmail.com
- [Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (vllm-project#16472)
Signed-off-by: 开哲 kaizhe.zy@alibaba-inc.com Co-authored-by: 开哲 kaizhe.zy@alibaba-inc.com
- [V1] Update structured output (vllm-project#16812)
Signed-off-by: reidliu41 reid201711@gmail.com Co-authored-by: reidliu41 reid201711@gmail.com
- [doc] update to hyperlink (vllm-project#17096)
Signed-off-by: reidliu41 reid201711@gmail.com Co-authored-by: reidliu41 reid201711@gmail.com
- Add docs for runai_streamer_sharded (vllm-project#17093)
Signed-off-by: Omer Dayan (SW-GPU) omer@run.ai Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- [Chore] Remove Sampler from Model Code (vllm-project#17084)
Signed-off-by: Woosuk Kwon woosuk.kwon@berkeley.edu
- Disable enforce_eager for V1 TPU sampler and structured output tests (vllm-project#17016)
Signed-off-by: mgoin mgoin64@gmail.com
- Simplify
TokenizerGroup(vllm-project#16790)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- Fix OOT registration test (vllm-project#17099)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [V1][PP] Optimization: continue scheduling prefill chunks (vllm-project#17080)
Signed-off-by: Rui Qiao ruisearch42@gmail.com
- [Misc] Remove OLMo2 config copy (vllm-project#17066)
Signed-off-by: Isotr0py 2037008807@qq.com
- Improve static type checking in
LoRAModelRunnerMixin(vllm-project#17104)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (vllm-project#16954)
Signed-off-by: shen-shanshan 467638484@qq.com
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (vllm-project#16970)
Add missing rocm_skinny_gemms kernel test to CI (vllm-project#17060)
Signed-off-by: mgoin mgoin64@gmail.com
- [Misc] refactor example series - structured outputs (vllm-project#17040)
Signed-off-by: reidliu41 reid201711@gmail.com Co-authored-by: reidliu41 reid201711@gmail.com
- [V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (vllm-project#16665)
Signed-off-by: Mark McLoughlin markmc@redhat.com
- [CI] Add automation for the
tool-callinggithub label (vllm-project#17118)
Signed-off-by: Russell Bryant rbryant@redhat.com
- Updating builkite job for IBM Power (vllm-project#17111)
Signed-off-by: Aaruni Aggarwal aaruniagg@gmail.com
existing torch installation pip command fix for docs (vllm-project#17059)
Molmo Requirements (vllm-project#17026)
Signed-off-by: Eyshika Agarwal eyshikaengineer@gmail.com Signed-off-by: eyshika eyshikaengineer@gmail.com
- Add
:markdownhelp:toEngineArgsdocs so markdown docstrings render properly (vllm-project#17124)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- Improve configs -
LoRAConfig+PromptAdapterConfig(vllm-project#16980)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Docs] Generate correct github links for decorated functions (vllm-project#17125)
Signed-off-by: Russell Bryant rbryant@redhat.com
- Add collective_rpc to llm engine (vllm-project#16999)
Signed-off-by: Yinghai Lu yinghai@thinkingmachines.ai
- Add chat template for Llama 4 models (vllm-project#16428)
Signed-off-by: Max de Bayser mbayser@br.ibm.com
- [Misc] Add example to run DeepSeek with Ray Serve LLM (vllm-project#17134)
Signed-off-by: Rui Qiao ruisearch42@gmail.com
- Better error message for missing mistral params.json (vllm-project#17132)
Signed-off-by: mgoin mgoin64@gmail.com
- Use custom address for listening socket (vllm-project#15988)
Signed-off-by: Jens Glaser glaserj@ornl.gov
- [FEAT] [ROCm]: AITER Fused MOE V1 Support (vllm-project#16752)
Signed-off-by: vllmellm vllm.ellm@embeddedllm.com Co-authored-by: tjtanaa tunjian.tan@embeddedllm.com
- [Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (vllm-project#16864)
Signed-off-by: Lucas Wilkinson lwilkinson@neuralmagic.com
- fix float16 support for kimi-vl (vllm-project#17156)
Co-authored-by: zhouzaida zhouzaida@msh.team
- [Doc] V1 : Update LoRA status (vllm-project#17133)
Signed-off-by: varun sundar rabindranath vsundarr@redhat.com Co-authored-by: varun sundar rabindranath vsundarr@redhat.com
[Docs] Fix True->true in supported_models.md (vllm-project#17141)
Move missed
SchedulerConfigargs into scheduler config group inEngineArgs(vllm-project#17131)
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
- [Misc] Clean up redundant code in uniproc_executor.py (vllm-project#16762)
Signed-off-by: Lifu Huang lifu.hlf@gmail.com
- [Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (vllm-project#15099)
Signed-off-by: Mengqing Cao cmq0113@163.com
- [Misc] Benchmark Serving Script Support Appending Results (vllm-project#17028)
Signed-off-by: Lucas Wilkinson lwilkinson@neuralmagic.com
- [Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (vllm-project#16457)
Signed-off-by: cynthieye yexin93@qq.com Co-authored-by: MagnetoWang magnetowang@outlook.com
- [Bugfix] remove fallback in guided_json (int range, patterns) (vllm-project#16725)
Signed-off-by: csy1204 josang1204@gmail.com Co-authored-by: 조상연[플레이스 AI] sang-yeon.cho@navercorp.com
- [Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (vllm-project#15734)
Signed-off-by: Randall Smith Randall.Smith@amd.com Signed-off-by: Luka Govedič lgovedic@redhat.com Co-authored-by: Luka Govedič lgovedic@redhat.com
- [Doc] Add headings to improve gptqmodel.md (vllm-project#17164)
Signed-off-by: windsonsea haifeng.yao@daocloud.io
Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (vllm-project#17158)
[Doc] Add two links to disagg_prefill.md (vllm-project#17168)
Signed-off-by: windsonsea haifeng.yao@daocloud.io
- [Doc] Move todo out of beam search docstring (vllm-project#17183)
Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com
- [Bugfix] Fix mistral model tests (vllm-project#17181)
Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk
- [Bugfix] Fix Mistral ChatCompletionRequest Body Exception (vllm-project#16769)
Signed-off-by: Jasmond Loh Jasmond.Loh@hotmail.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com
- Fix API typo and remove FP8 on V1 restriction
Signed-off-by: Nick Hill nhill@redhat.com Signed-off-by: Chenyaaang chenyangli@google.com Signed-off-by: Guillaume Calmettes gcalmettes@scaleway.com Signed-off-by: Yang Wang elainewy@meta.com Signed-off-by: Sage Moore sage@neuralmagic.com Signed-off-by: root root@banff-cyxtera-s73-5.ctr.dcgpu Signed-off-by: Aleksandr Malyshev maleksan@amd.com Signed-off-by: root root@banff-cyxtera-s65-4.amd.com Signed-off-by: maleksan85 maleksan@amd.com Signed-off-by: <> Signed-off-by: vllmellm vllm.ellm@embeddedllm.com Signed-off-by: youkaichao youkaichao@gmail.com Signed-off-by: Michal Adamczyk michal.adamczyk@intel.com Signed-off-by: Chendi Xue chendi.xue@intel.com Signed-off-by: reidliu41 reid201711@gmail.com Signed-off-by: Lucas Wilkinson lwilkinson@neuralmagic.com Signed-off-by: gitover22 qidizou88@gmail.com Signed-off-by: chaunceyjiang chaunceyjiang@gmail.com Signed-off-by: Russell Bryant rbryant@redhat.com Signed-off-by: mgoin mgoin64@gmail.com Signed-off-by: windsonsea haifeng.yao@daocloud.io Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Signed-off-by: Travis Johnson tsjohnso@us.ibm.com Signed-off-by: Woosuk Kwon woosuk.kwon@berkeley.edu Signed-off-by: csy1204 josang1204@gmail.com Signed-off-by: sydarb areebsyed237@gmail.com Signed-off-by: 开哲 kaizhe.zy@alibaba-inc.com Signed-off-by: Omer Dayan (SW-GPU) omer@run.ai Signed-off-by: Rui Qiao ruisearch42@gmail.com Signed-off-by: Isotr0py 2037008807@qq.com Signed-off-by: shen-shanshan 467638484@qq.com Signed-off-by: Mark McLoughlin markmc@redhat.com Signed-off-by: Aaruni Aggarwal aaruniagg@gmail.com Signed-off-by: Eyshika Agarwal eyshikaengineer@gmail.com Signed-off-by: eyshika eyshikaengineer@gmail.com Signed-off-by: Yinghai Lu yinghai@thinkingmachines.ai Signed-off-by: Max de Bayser mbayser@br.ibm.com Signed-off-by: Jens Glaser glaserj@ornl.gov Signed-off-by: varun sundar rabindranath vsundarr@redhat.com Signed-off-by: Lifu Huang lifu.hlf@gmail.com Signed-off-by: Mengqing Cao cmq0113@163.com Signed-off-by: cynthieye yexin93@qq.com Signed-off-by: Randall Smith Randall.Smith@amd.com Signed-off-by: Luka Govedič lgovedic@redhat.com Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com Signed-off-by: DarkLight1337 tlleungac@connect.ust.hk Signed-off-by: Jasmond Loh Jasmond.Loh@hotmail.com Co-authored-by: Nick Hill nhill@redhat.com Co-authored-by: Chenyaaang 42742451+Chenyaaang@users.noreply.github.com Co-authored-by: Guillaume Calmettes gcalmettes@scaleway.com Co-authored-by: Yang Wang elainewy@meta.com Co-authored-by: Aleksandr Malyshev 164964928+maleksan85@users.noreply.github.com Co-authored-by: Sage Moore sage@neuralmagic.com Co-authored-by: root root@banff-cyxtera-s73-5.ctr.dcgpu Co-authored-by: Aleksandr Malyshev maleksan@amd.com Co-authored-by: qli88 qiang.li2@amd.com Co-authored-by: root root@banff-cyxtera-s65-4.amd.com Co-authored-by: vllmellm vllm.ellm@embeddedllm.com Co-authored-by: Chauncey chaunceyjiang@gmail.com Co-authored-by: youkaichao youkaichao@gmail.com Co-authored-by: Chendi.Xue chendi.xue@intel.com Co-authored-by: Michal Adamczyk madamczyk@habana.ai Co-authored-by: Reid 61492567+reidliu41@users.noreply.github.com Co-authored-by: reidliu41 reid201711@gmail.com Co-authored-by: Lucas Wilkinson LucasWilkinson@users.noreply.github.com Co-authored-by: huafeng qidizou88@gmail.com Co-authored-by: Russell Bryant rbryant@redhat.com Co-authored-by: Michael Goin mgoin64@gmail.com Co-authored-by: Michael Yao haifeng.yao@daocloud.io Co-authored-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Co-authored-by: Travis Johnson tsjohnso@us.ibm.com Co-authored-by: Yong Hoon Shin 48474650+sarckk@users.noreply.github.com Co-authored-by: Woosuk Kwon woosuk.kwon@berkeley.edu Co-authored-by: Sangyeon Cho josang1204@gmail.com Co-authored-by: Chen Xia cxia0209@gmail.com Co-authored-by: Areeb Syed areebsyed237@gmail.com Co-authored-by: 张宇 zhangyuygss@outlook.com Co-authored-by: 开哲 kaizhe.zy@alibaba-inc.com Co-authored-by: omer-dayan omdayan@nvidia.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Rui Qiao 161574667+ruisearch42@users.noreply.github.com Co-authored-by: Isotr0py mozf@mail2.sysu.edu.cn Co-authored-by: Shanshan Shen 467638484@qq.com Co-authored-by: wang.yuqi noooop@126.com Co-authored-by: Mark McLoughlin markmc@redhat.com Co-authored-by: Aaruni Aggarwal 47731267+AaruniAggarwal@users.noreply.github.com Co-authored-by: Atilla 48064466+atilla00@users.noreply.github.com Co-authored-by: Eyshika Agarwal eyshikaengineer@gmail.com Co-authored-by: Yinghai Lu yinghai@thinkingmachines.ai Co-authored-by: Maximilien de Bayser mbayser@br.ibm.com Co-authored-by: jglaser glaserj@ornl.gov Co-authored-by: tjtanaa tunjian.tan@embeddedllm.com Co-authored-by: Zaida Zhou 58739961+zhouzaida@users.noreply.github.com Co-authored-by: zhouzaida zhouzaida@msh.team Co-authored-by: Varun Sundar Rabindranath varunsundar08@gmail.com Co-authored-by: varun sundar rabindranath vsundarr@redhat.com Co-authored-by: Lifu Huang lifu.hlf@gmail.com Co-authored-by: Mengqing Cao cmq0113@163.com Co-authored-by: yexin(叶鑫) yexin93@qq.com Co-authored-by: MagnetoWang magnetowang@outlook.com Co-authored-by: 조상연[플레이스 AI] sang-yeon.cho@navercorp.com Co-authored-by: rasmith Randall.Smith@amd.com Co-authored-by: Luka Govedič lgovedic@redhat.com Co-authored-by: Lu Fang 30275821+houseroad@users.noreply.github.com Co-authored-by: Alex Brooks alex.brooks@ibm.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk Co-authored-by: Jasmond L 120363110+JasmondL@users.noreply.github.com
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Signed-off-by: Agata Dobrzyniewicz adobrzyniewicz@habana.ai
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com Signed-off-by: Mu Huai tianbowen.tbw@antgroup.com
0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request
Signed-off-by: Harry Mellor 19981378+hmellor@users.noreply.github.com
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})