Fix test_schedule_swapped_simple in test_scheduler.py by sroy745 · Pull Request #8780 · vllm-project/vllm (original) (raw)

added 30 commits

May 28, 2024 20:39

sroy745 marked this pull request as ready for review

September 24, 2024 19:45

sroy745 changed the title~~[WIP] Fix test_schedule_swapped_simple in test_scheduler.py~~ Fix test_schedule_swapped_simple in test_scheduler.py

Sep 24, 2024

comaniac added the ready

ONLY add when PR is ready to merge/full CI is needed

label

Sep 24, 2024

Manikandan-Thangaraj-ZS0321 added a commit to Manikandan-Thangaraj-ZS0321/vllm that referenced this pull request

Sep 25, 2024

[Kernel] Enable 8-bit weights in Fused Marlin MoE (vllm-project#8032)

Co-authored-by: Dipika dipikasikka1@gmail.com

[Frontend] Expose revision arg in OpenAI server (vllm-project#8501)
[BugFix] Fix clean shutdown issues (vllm-project#8492)
[Bugfix][Kernel] Fix build for sm_60 in GGUF kernel (vllm-project#8506)
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (vllm-project#7270)
[doc] update doc on testing and debugging (vllm-project#8514)
[Bugfix] Bind api server port before starting engine (vllm-project#8491)
[perf bench] set timeout to debug hanging (vllm-project#8516)
[misc] small qol fixes for release process (vllm-project#8517)
[Bugfix] Fix 3.12 builds on main (vllm-project#8510)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com

[refactor] remove triton based sampler (vllm-project#8524)
[Frontend] Improve Nullable kv Arg Parsing (vllm-project#8525)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com

[Misc][Bugfix] Disable guided decoding for mistral tokenizer (vllm-project#8521)
[torch.compile] register allreduce operations as custom ops (vllm-project#8526)
[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (vllm-project#8509)

Signed-off-by: Rui Qiao ruisearch42@gmail.com

[Benchmark] Support sample from HF datasets and image input for benchmark_serving (vllm-project#8495)
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (vllm-project#7631)
[Feature][kernel] tensor parallelism with bitsandbytes quantization (vllm-project#8434)
[Model] Add mistral function calling format to all models loaded with "mistral" format (vllm-project#8515)

Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com

[Misc] Don't dump contents of kvcache tensors on errors (vllm-project#8527)
[Bugfix] Fix TP > 1 for new granite (vllm-project#8544)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com

[doc] improve installation doc (vllm-project#8550)

Co-authored-by: Andy Dai 76841985+Imss27@users.noreply.github.com

[CI/Build] Excluding kernels/test_gguf.py from ROCm (vllm-project#8520)
[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (vllm-project#8012)
[CI/Build] fix Dockerfile.cpu on podman (vllm-project#8540)
[Misc] Add argument to disable FastAPI docs (vllm-project#8554)
[CI/Build] Avoid CUDA initialization (vllm-project#8534)
[CI/Build] Update Ruff version (vllm-project#8469)

Signed-off-by: Aaron Pham contact@aarnphm.xyz Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com

[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (vllm-project#8157)

Co-authored-by: Nick Hill nickhill@us.ibm.com Co-authored-by: rshaw@neuralmagic.com rshaw@neuralmagic.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-neuralmagic@users.noreply.github.com Co-authored-by: Simon Mo simon.mo@hey.com

[Core] Prompt logprobs support in Multi-step (vllm-project#8199)
[Core] zmq: bind only to 127.0.0.1 for local-only usage (vllm-project#8543)

Signed-off-by: Russell Bryant rbryant@redhat.com

[Model] Support Solar Model (vllm-project#8386)

Co-authored-by: Michael Goin michael@neuralmagic.com

[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (vllm-project#8380)

Co-authored-by: Alexei-V-Ivanov-AMD 156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com Co-authored-by: Michael Goin michael@neuralmagic.com

[Kernel] Change interface to Mamba selective_state_update for continuous batching (vllm-project#8039)
[BugFix] Nonzero exit code if MQLLMEngine startup fails (vllm-project#8572)
[Bugfix] add dead_error property to engine client (vllm-project#8574)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com

[Kernel] Remove marlin moe templating on thread_m_blocks (vllm-project#8573)

Co-authored-by: lwilkinson@neuralmagic.com

[Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata construction during decode of encoder-decoder models. (vllm-project#8545)
Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer" (vllm-project#8593)
[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (vllm-project#8616)
[MISC] remove engine_use_ray in benchmark_throughput.py (vllm-project#8615)
[Frontend] Use MQLLMEngine for embeddings models too (vllm-project#8584)
[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (vllm-project#8577)
[Core] simplify logits resort in _apply_top_k_top_p (vllm-project#8619)
[Doc] Add documentation for GGUF quantization (vllm-project#8618)
Create SECURITY.md (vllm-project#8642)
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (vllm-project#8551)
[Misc] guard against change in cuda library name (vllm-project#8609)
[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (vllm-project#8571)
[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (vllm-project#8474)
[Core] Support Lora lineage and base model metadata management (vllm-project#6315)
[Model] Add OLMoE (vllm-project#7922)
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (vllm-project#8670)
[Bugfix] Validate SamplingParam n is an int (vllm-project#8548)
[Misc] Show AMD GPU topology in collect_env.py (vllm-project#8649)
[Bugfix] Config got an unexpected keyword argument 'engine' (vllm-project#8556)
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (vllm-project#8640)
[Doc] neuron documentation update (vllm-project#8671)

Signed-off-by: omrishiv 327609+omrishiv@users.noreply.github.com

[Hardware][AWS] update neuron to 2.20 (vllm-project#8676)

Signed-off-by: omrishiv 327609+omrishiv@users.noreply.github.com

[Bugfix] Fix incorrect llava next feature size calculation (vllm-project#8496)
[Core] Rename PromptInputs and inputs(vllm-project#8673)
[MISC] add support custom_op check (vllm-project#8557)

Co-authored-by: youkaichao youkaichao@126.com

[Core] Factor out common code in SequenceData and Sequence (vllm-project#8675)
[beam search] add output for manually checking the correctness (vllm-project#8684)
[Kernel] Build flash-attn from source (vllm-project#8245)
[VLM] Use SequenceData.from_token_counts to create dummy data (vllm-project#8687)
[Doc] Fix typo in AMD installation guide (vllm-project#8689)
[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (vllm-project#8646)
[dbrx] refactor dbrx experts to extend FusedMoe class (vllm-project#8518)
[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (vllm-project#8643)
[Bugfix] Refactor composite weight loading logic (vllm-project#8656)
[ci][build] fix vllm-flash-attn (vllm-project#8699)
[Model] Refactor BLIP/BLIP-2 to support composite model loading (vllm-project#8407)
[Misc] Use NamedTuple in Multi-image example (vllm-project#8705)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com

[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (vllm-project#8703)
[Model][VLM] Add LLaVA-Onevision model support (vllm-project#8486)

Co-authored-by: litianjian litianjian@bytedance.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Roger Wang ywang@roblox.com Co-authored-by: DarkLight1337 tlleungac@connect.ust.hk

[SpecDec][Misc] Cleanup, remove bonus token logic. (vllm-project#8701)
[build] enable existing pytorch (for GH200, aarch64, nightly) (vllm-project#8713)
[misc] upgrade mistral-common (vllm-project#8715)
[Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (vllm-project#8702)
[Bugfix] Fix CPU CMake build (vllm-project#8723)

Co-authored-by: Yuan yuan.zhou@intel.com

[Bugfix] fix docker build for xpu (vllm-project#8652)
[Core][Frontend] Support Passing Multimodal Processor Kwargs (vllm-project#8657)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com

[Hardware][CPU] Refactor CPU model runner (vllm-project#8729)
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (vllm-project#8733)
[Model] Support pp for qwen2-vl (vllm-project#8696)
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (vllm-project#8707)
[CI/Build] use setuptools-scm to set version (vllm-project#4738)

Co-authored-by: youkaichao youkaichao@126.com

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (vllm-project#7701)

Co-authored-by: mgoin michael@neuralmagic.com Co-authored-by: Divakar Verma 137818590+divakar-amd@users.noreply.github.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com

[Kernel][LoRA] Add assertion for punica sgmv kernels (vllm-project#7585)
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (vllm-project#8575)

Signed-off-by: Russell Bryant rbryant@redhat.com

Fix typical acceptance sampler with correct recovered token ids (vllm-project#8562)
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (vllm-project#8335)
[Hardware][AMD] ROCm6.2 upgrade (vllm-project#8674)
Fix tests in test_scheduler.py that fail with BlockManager V2 (vllm-project#8728)
re-implement beam search on top of vllm core (vllm-project#8726)

Co-authored-by: Brendan Wong bjwpokemon@gmail.com

Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (vllm-project#8750)
[MISC] Skip dumping inputs when unpicklable (vllm-project#8744)
[Core][Model] Support loading weights by ID within models (vllm-project#7931)
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (vllm-project#8658)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: DarkLight1337 tlleungac@connect.ust.hk

[Bugfix] Fix potentially unsafe custom allreduce synchronization (vllm-project#8558)
[Kernel] Split Marlin MoE kernels into multiple files (vllm-project#8661)

Co-authored-by: mgoin michael@neuralmagic.com

[Frontend] Batch inference for llm.chat() API (vllm-project#8648)

Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk Co-authored-by: Roger Wang ywang@roblox.com Co-authored-by: Roger Wang 136131678+ywang96@users.noreply.github.com

[Bugfix] Fix torch dynamo fixes caused by replace_parameters (vllm-project#8748)
[CI/Build] fix setuptools-scm usage (vllm-project#8771)
[misc] soft drop beam search (vllm-project#8763)
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (vllm-project#8768)
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (vllm-project#8047)

Signed-off-by: Travis Johnson tsjohnso@us.ibm.com

[Core] Adding Priority Scheduling (vllm-project#5958)
[Bugfix] Use heartbeats instead of health checks (vllm-project#8583)
Fix test_schedule_swapped_simple in test_scheduler.py (vllm-project#8780)
[Bugfix][Kernel] Implement acquire/release polyfill for Pascal (vllm-project#8776)
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (vllm-project#8752)
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (vllm-project#8250)
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (vllm-project#8770)
[Bugfix] load fc bias from config for eagle (vllm-project#8790)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com Signed-off-by: Rui Qiao ruisearch42@gmail.com Signed-off-by: Aaron Pham contact@aarnphm.xyz Signed-off-by: Russell Bryant rbryant@redhat.com Signed-off-by: omrishiv 327609+omrishiv@users.noreply.github.com Signed-off-by: Travis Johnson tsjohnso@us.ibm.com Co-authored-by: ElizaWszola eliza@neuralmagic.com Co-authored-by: Dipika dipikasikka1@gmail.com Co-authored-by: lewtun lewis.c.tunstall@gmail.com Co-authored-by: Nick Hill nickhill@us.ibm.com Co-authored-by: sasha0552 admin@sasha0552.org Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com Co-authored-by: youkaichao youkaichao@gmail.com Co-authored-by: Kevin Lin 42618777+kevin314@users.noreply.github.com Co-authored-by: Simon Mo simon.mo@hey.com Co-authored-by: Joe Runde Joseph.Runde@ibm.com Co-authored-by: Alex Brooks alex.brooks@ibm.com Co-authored-by: Roger Wang 136131678+ywang96@users.noreply.github.com Co-authored-by: Rui Qiao 161574667+ruisearch42@users.noreply.github.com Co-authored-by: Isotr0py 2037008807@qq.com Co-authored-by: sroy745 142070531+sroy745@users.noreply.github.com Co-authored-by: chenqianfzh 51831990+chenqianfzh@users.noreply.github.com Co-authored-by: Patrick von Platen patrick.v.platen@gmail.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Andy Dai 76841985+Imss27@users.noreply.github.com Co-authored-by: Alexey Kondratiev(AMD) 143633163+alexeykondrat@users.noreply.github.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com Co-authored-by: Daniele 36171005+dtrifiro@users.noreply.github.com Co-authored-by: Jiaxin Shan seedjeffwan@gmail.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk Co-authored-by: Aaron Pham contact@aarnphm.xyz Co-authored-by: Alexander Matveev 59768536+alexm-neuralmagic@users.noreply.github.com Co-authored-by: rshaw@neuralmagic.com rshaw@neuralmagic.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-neuralmagic@users.noreply.github.com Co-authored-by: afeldman-nm 156691304+afeldman-nm@users.noreply.github.com Co-authored-by: Russell Bryant rbryant@redhat.com Co-authored-by: Geun, Lim shing100@Naver.com Co-authored-by: Michael Goin michael@neuralmagic.com Co-authored-by: Gregory Shtrasberg 156009573+gshtras@users.noreply.github.com Co-authored-by: Alexei-V-Ivanov-AMD 156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com Co-authored-by: Kuntai Du kuntai@uchicago.edu Co-authored-by: Kunshang Ji kunshang.ji@intel.com Co-authored-by: Charlie Fu charlifu@amd.com Co-authored-by: 盏一 w@hidva.com Co-authored-by: bnellnm 49004751+bnellnm@users.noreply.github.com Co-authored-by: Amit Garg mitgarg17495@gmail.com Co-authored-by: William Lin SolitaryThinker@users.noreply.github.com Co-authored-by: Niklas Muennighoff n.muennighoff@gmail.com Co-authored-by: saumya-saran saumya.saran@c3.ai Co-authored-by: Pastel！ 1627301104@qq.com Co-authored-by: omrishiv 327609+omrishiv@users.noreply.github.com Co-authored-by: zyddnys zyddnys@outlook.com Co-authored-by: youkaichao youkaichao@126.com Co-authored-by: rasmith Randall.Smith@amd.com Co-authored-by: Divakar Verma 137818590+divakar-amd@users.noreply.github.com Co-authored-by: Huazhong Ji hzji210@gmail.com Co-authored-by: litianjian 45817262+litianjian@users.noreply.github.com Co-authored-by: litianjian litianjian@bytedance.com Co-authored-by: Roger Wang ywang@roblox.com Co-authored-by: Lily Liu lilyliupku@gmail.com Co-authored-by: Yuan yuan.zhou@intel.com Co-authored-by: Yan Ma yan.ma@intel.com Co-authored-by: Li, Jiang jiang1.li@intel.com Co-authored-by: Yanyi Liu wolfsonliu@163.com Co-authored-by: Jani Monoses jani.monoses@gmail.com Co-authored-by: Lucas Wilkinson LucasWilkinson@users.noreply.github.com Co-authored-by: Jee Jee Li pandaleefree@gmail.com Co-authored-by: jiqing-feng 107918818+jiqing-feng@users.noreply.github.com Co-authored-by: Hongxia Yang 62075498+hongxiayang@users.noreply.github.com Co-authored-by: Brendan Wong bjwpokemon@gmail.com Co-authored-by: Cody Yu hao.yu.cody@gmail.com Co-authored-by: Peter Salas peter@fixie.ai Co-authored-by: Hanzhi Zhou hanzhi713@gmail.com Co-authored-by: Andy 37781802+aandyw@users.noreply.github.com Co-authored-by: Travis Johnson tsjohnso@us.ibm.com Co-authored-by: Archit Patke apatke@illinois.edu Co-authored-by: zifeitong zifeitong@gmail.com Co-authored-by: sohamparikh sohamparikh47@gmail.com

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request

Oct 26, 2024

)

Signed-off-by: Alvant alvasian@yandex.ru

garg-amit pushed a commit to garg-amit/vllm that referenced this pull request

Oct 28, 2024

)

Signed-off-by: Amit Garg mitgarg17495@gmail.com

kwang1012 pushed a commit to kwang1012/vllm that referenced this pull request

Oct 28, 2024

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request

Nov 14, 2024

)

Signed-off-by: Sumit Dubey sumit.dubey2@ibm.com

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request

Mar 26, 2025

)

Signed-off-by: LeiWang1999 leiwang1999@outlook.com

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})