[doc] update doc on testing and debugging by youkaichao · Pull Request #8514 · vllm-project/vllm (original) (raw)

Manikandan-Thangaraj-ZS0321 added a commit to Manikandan-Thangaraj-ZS0321/vllm that referenced this pull request

[Kernel] Enable 8-bit weights in Fused Marlin MoE (vllm-project#8032)

Co-authored-by: Dipika dipikasikka1@gmail.com

[Frontend] Expose revision arg in OpenAI server (vllm-project#8501)
[BugFix] Fix clean shutdown issues (vllm-project#8492)
[Bugfix][Kernel] Fix build for sm_60 in GGUF kernel (vllm-project#8506)
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (vllm-project#7270)
[doc] update doc on testing and debugging (vllm-project#8514)
[Bugfix] Bind api server port before starting engine (vllm-project#8491)
[perf bench] set timeout to debug hanging (vllm-project#8516)
[misc] small qol fixes for release process (vllm-project#8517)
[Bugfix] Fix 3.12 builds on main (vllm-project#8510)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com

[refactor] remove triton based sampler (vllm-project#8524)
[Frontend] Improve Nullable kv Arg Parsing (vllm-project#8525)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com

[Misc][Bugfix] Disable guided decoding for mistral tokenizer (vllm-project#8521)
[torch.compile] register allreduce operations as custom ops (vllm-project#8526)
[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (vllm-project#8509)

Signed-off-by: Rui Qiao ruisearch42@gmail.com

[Benchmark] Support sample from HF datasets and image input for benchmark_serving (vllm-project#8495)
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (vllm-project#7631)
[Feature][kernel] tensor parallelism with bitsandbytes quantization (vllm-project#8434)
[Model] Add mistral function calling format to all models loaded with "mistral" format (vllm-project#8515)

Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com

[Misc] Don't dump contents of kvcache tensors on errors (vllm-project#8527)
[Bugfix] Fix TP > 1 for new granite (vllm-project#8544)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com

[doc] improve installation doc (vllm-project#8550)

Co-authored-by: Andy Dai 76841985+Imss27@users.noreply.github.com

[CI/Build] Excluding kernels/test_gguf.py from ROCm (vllm-project#8520)
[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (vllm-project#8012)
[CI/Build] fix Dockerfile.cpu on podman (vllm-project#8540)
[Misc] Add argument to disable FastAPI docs (vllm-project#8554)
[CI/Build] Avoid CUDA initialization (vllm-project#8534)
[CI/Build] Update Ruff version (vllm-project#8469)

Signed-off-by: Aaron Pham contact@aarnphm.xyz Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com

[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (vllm-project#8157)

Co-authored-by: Nick Hill nickhill@us.ibm.com Co-authored-by: rshaw@neuralmagic.com rshaw@neuralmagic.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-neuralmagic@users.noreply.github.com Co-authored-by: Simon Mo simon.mo@hey.com

[Core] Prompt logprobs support in Multi-step (vllm-project#8199)
[Core] zmq: bind only to 127.0.0.1 for local-only usage (vllm-project#8543)

Signed-off-by: Russell Bryant rbryant@redhat.com

[Model] Support Solar Model (vllm-project#8386)

Co-authored-by: Michael Goin michael@neuralmagic.com

[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (vllm-project#8380)

Co-authored-by: Alexei-V-Ivanov-AMD 156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com Co-authored-by: Michael Goin michael@neuralmagic.com

[Kernel] Change interface to Mamba selective_state_update for continuous batching (vllm-project#8039)
[BugFix] Nonzero exit code if MQLLMEngine startup fails (vllm-project#8572)
[Bugfix] add dead_error property to engine client (vllm-project#8574)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com

[Kernel] Remove marlin moe templating on thread_m_blocks (vllm-project#8573)

Co-authored-by: lwilkinson@neuralmagic.com

[Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata construction during decode of encoder-decoder models. (vllm-project#8545)
Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer" (vllm-project#8593)
[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (vllm-project#8616)
[MISC] remove engine_use_ray in benchmark_throughput.py (vllm-project#8615)
[Frontend] Use MQLLMEngine for embeddings models too (vllm-project#8584)
[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (vllm-project#8577)
[Core] simplify logits resort in _apply_top_k_top_p (vllm-project#8619)
[Doc] Add documentation for GGUF quantization (vllm-project#8618)
Create SECURITY.md (vllm-project#8642)
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (vllm-project#8551)
[Misc] guard against change in cuda library name (vllm-project#8609)
[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (vllm-project#8571)
[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (vllm-project#8474)
[Core] Support Lora lineage and base model metadata management (vllm-project#6315)
[Model] Add OLMoE (vllm-project#7922)
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (vllm-project#8670)
[Bugfix] Validate SamplingParam n is an int (vllm-project#8548)
[Misc] Show AMD GPU topology in collect_env.py (vllm-project#8649)
[Bugfix] Config got an unexpected keyword argument 'engine' (vllm-project#8556)
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (vllm-project#8640)
[Doc] neuron documentation update (vllm-project#8671)

Signed-off-by: omrishiv 327609+omrishiv@users.noreply.github.com

[Hardware][AWS] update neuron to 2.20 (vllm-project#8676)

Signed-off-by: omrishiv 327609+omrishiv@users.noreply.github.com

[Bugfix] Fix incorrect llava next feature size calculation (vllm-project#8496)
[Core] Rename PromptInputs and inputs(vllm-project#8673)
[MISC] add support custom_op check (vllm-project#8557)

Co-authored-by: youkaichao youkaichao@126.com

[Core] Factor out common code in SequenceData and Sequence (vllm-project#8675)
[beam search] add output for manually checking the correctness (vllm-project#8684)
[Kernel] Build flash-attn from source (vllm-project#8245)
[VLM] Use SequenceData.from_token_counts to create dummy data (vllm-project#8687)
[Doc] Fix typo in AMD installation guide (vllm-project#8689)
[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (vllm-project#8646)
[dbrx] refactor dbrx experts to extend FusedMoe class (vllm-project#8518)
[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (vllm-project#8643)
[Bugfix] Refactor composite weight loading logic (vllm-project#8656)
[ci][build] fix vllm-flash-attn (vllm-project#8699)
[Model] Refactor BLIP/BLIP-2 to support composite model loading (vllm-project#8407)
[Misc] Use NamedTuple in Multi-image example (vllm-project#8705)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com

[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (vllm-project#8703)
[Model][VLM] Add LLaVA-Onevision model support (vllm-project#8486)

Co-authored-by: litianjian litianjian@bytedance.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Roger Wang ywang@roblox.com Co-authored-by: DarkLight1337 tlleungac@connect.ust.hk

[SpecDec][Misc] Cleanup, remove bonus token logic. (vllm-project#8701)
[build] enable existing pytorch (for GH200, aarch64, nightly) (vllm-project#8713)
[misc] upgrade mistral-common (vllm-project#8715)
[Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (vllm-project#8702)
[Bugfix] Fix CPU CMake build (vllm-project#8723)

Co-authored-by: Yuan yuan.zhou@intel.com

[Bugfix] fix docker build for xpu (vllm-project#8652)
[Core][Frontend] Support Passing Multimodal Processor Kwargs (vllm-project#8657)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com

[Hardware][CPU] Refactor CPU model runner (vllm-project#8729)
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (vllm-project#8733)
[Model] Support pp for qwen2-vl (vllm-project#8696)
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (vllm-project#8707)
[CI/Build] use setuptools-scm to set version (vllm-project#4738)

Co-authored-by: youkaichao youkaichao@126.com

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (vllm-project#7701)

Co-authored-by: mgoin michael@neuralmagic.com Co-authored-by: Divakar Verma 137818590+divakar-amd@users.noreply.github.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com

[Kernel][LoRA] Add assertion for punica sgmv kernels (vllm-project#7585)
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (vllm-project#8575)

Signed-off-by: Russell Bryant rbryant@redhat.com

Fix typical acceptance sampler with correct recovered token ids (vllm-project#8562)
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (vllm-project#8335)
[Hardware][AMD] ROCm6.2 upgrade (vllm-project#8674)
Fix tests in test_scheduler.py that fail with BlockManager V2 (vllm-project#8728)
re-implement beam search on top of vllm core (vllm-project#8726)

Co-authored-by: Brendan Wong bjwpokemon@gmail.com

Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (vllm-project#8750)
[MISC] Skip dumping inputs when unpicklable (vllm-project#8744)
[Core][Model] Support loading weights by ID within models (vllm-project#7931)
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (vllm-project#8658)

Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: DarkLight1337 tlleungac@connect.ust.hk

[Bugfix] Fix potentially unsafe custom allreduce synchronization (vllm-project#8558)
[Kernel] Split Marlin MoE kernels into multiple files (vllm-project#8661)

Co-authored-by: mgoin michael@neuralmagic.com

[Frontend] Batch inference for llm.chat() API (vllm-project#8648)

Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk Co-authored-by: Roger Wang ywang@roblox.com Co-authored-by: Roger Wang 136131678+ywang96@users.noreply.github.com

[Bugfix] Fix torch dynamo fixes caused by replace_parameters (vllm-project#8748)
[CI/Build] fix setuptools-scm usage (vllm-project#8771)
[misc] soft drop beam search (vllm-project#8763)
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (vllm-project#8768)
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (vllm-project#8047)

Signed-off-by: Travis Johnson tsjohnso@us.ibm.com

[Core] Adding Priority Scheduling (vllm-project#5958)
[Bugfix] Use heartbeats instead of health checks (vllm-project#8583)
Fix test_schedule_swapped_simple in test_scheduler.py (vllm-project#8780)
[Bugfix][Kernel] Implement acquire/release polyfill for Pascal (vllm-project#8776)
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (vllm-project#8752)
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (vllm-project#8250)
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (vllm-project#8770)
[Bugfix] load fc bias from config for eagle (vllm-project#8790)

Signed-off-by: Joe Runde Joseph.Runde@ibm.com Signed-off-by: Alex-Brooks Alex.Brooks@ibm.com Signed-off-by: Rui Qiao ruisearch42@gmail.com Signed-off-by: Aaron Pham contact@aarnphm.xyz Signed-off-by: Russell Bryant rbryant@redhat.com Signed-off-by: omrishiv 327609+omrishiv@users.noreply.github.com Signed-off-by: Travis Johnson tsjohnso@us.ibm.com Co-authored-by: ElizaWszola eliza@neuralmagic.com Co-authored-by: Dipika dipikasikka1@gmail.com Co-authored-by: lewtun lewis.c.tunstall@gmail.com Co-authored-by: Nick Hill nickhill@us.ibm.com Co-authored-by: sasha0552 admin@sasha0552.org Co-authored-by: Luka Govedič ProExpertProg@users.noreply.github.com Co-authored-by: youkaichao youkaichao@gmail.com Co-authored-by: Kevin Lin 42618777+kevin314@users.noreply.github.com Co-authored-by: Simon Mo simon.mo@hey.com Co-authored-by: Joe Runde Joseph.Runde@ibm.com Co-authored-by: Alex Brooks alex.brooks@ibm.com Co-authored-by: Roger Wang 136131678+ywang96@users.noreply.github.com Co-authored-by: Rui Qiao 161574667+ruisearch42@users.noreply.github.com Co-authored-by: Isotr0py 2037008807@qq.com Co-authored-by: sroy745 142070531+sroy745@users.noreply.github.com Co-authored-by: chenqianfzh 51831990+chenqianfzh@users.noreply.github.com Co-authored-by: Patrick von Platen patrick.v.platen@gmail.com Co-authored-by: Cyrus Leung cyrus.tl.leung@gmail.com Co-authored-by: Andy Dai 76841985+Imss27@users.noreply.github.com Co-authored-by: Alexey Kondratiev(AMD) 143633163+alexeykondrat@users.noreply.github.com Co-authored-by: Tyler Michael Smith tyler@neuralmagic.com Co-authored-by: Daniele 36171005+dtrifiro@users.noreply.github.com Co-authored-by: Jiaxin Shan seedjeffwan@gmail.com Co-authored-by: Cyrus Leung tlleungac@connect.ust.hk Co-authored-by: Aaron Pham contact@aarnphm.xyz Co-authored-by: Alexander Matveev 59768536+alexm-neuralmagic@users.noreply.github.com Co-authored-by: rshaw@neuralmagic.com rshaw@neuralmagic.com Co-authored-by: Robert Shaw 114415538+robertgshaw2-neuralmagic@users.noreply.github.com Co-authored-by: afeldman-nm 156691304+afeldman-nm@users.noreply.github.com Co-authored-by: Russell Bryant rbryant@redhat.com Co-authored-by: Geun, Lim shing100@Naver.com Co-authored-by: Michael Goin michael@neuralmagic.com Co-authored-by: Gregory Shtrasberg 156009573+gshtras@users.noreply.github.com Co-authored-by: Alexei-V-Ivanov-AMD 156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com Co-authored-by: Kuntai Du kuntai@uchicago.edu Co-authored-by: Kunshang Ji kunshang.ji@intel.com Co-authored-by: Charlie Fu charlifu@amd.com Co-authored-by: 盏一 w@hidva.com Co-authored-by: bnellnm 49004751+bnellnm@users.noreply.github.com Co-authored-by: Amit Garg mitgarg17495@gmail.com Co-authored-by: William Lin SolitaryThinker@users.noreply.github.com Co-authored-by: Niklas Muennighoff n.muennighoff@gmail.com Co-authored-by: saumya-saran saumya.saran@c3.ai Co-authored-by: Pastel！ 1627301104@qq.com Co-authored-by: omrishiv 327609+omrishiv@users.noreply.github.com Co-authored-by: zyddnys zyddnys@outlook.com Co-authored-by: youkaichao youkaichao@126.com Co-authored-by: rasmith Randall.Smith@amd.com Co-authored-by: Divakar Verma 137818590+divakar-amd@users.noreply.github.com Co-authored-by: Huazhong Ji hzji210@gmail.com Co-authored-by: litianjian 45817262+litianjian@users.noreply.github.com Co-authored-by: litianjian litianjian@bytedance.com Co-authored-by: Roger Wang ywang@roblox.com Co-authored-by: Lily Liu lilyliupku@gmail.com Co-authored-by: Yuan yuan.zhou@intel.com Co-authored-by: Yan Ma yan.ma@intel.com Co-authored-by: Li, Jiang jiang1.li@intel.com Co-authored-by: Yanyi Liu wolfsonliu@163.com Co-authored-by: Jani Monoses jani.monoses@gmail.com Co-authored-by: Lucas Wilkinson LucasWilkinson@users.noreply.github.com Co-authored-by: Jee Jee Li pandaleefree@gmail.com Co-authored-by: jiqing-feng 107918818+jiqing-feng@users.noreply.github.com Co-authored-by: Hongxia Yang 62075498+hongxiayang@users.noreply.github.com Co-authored-by: Brendan Wong bjwpokemon@gmail.com Co-authored-by: Cody Yu hao.yu.cody@gmail.com Co-authored-by: Peter Salas peter@fixie.ai Co-authored-by: Hanzhi Zhou hanzhi713@gmail.com Co-authored-by: Andy 37781802+aandyw@users.noreply.github.com Co-authored-by: Travis Johnson tsjohnso@us.ibm.com Co-authored-by: Archit Patke apatke@illinois.edu Co-authored-by: zifeitong zifeitong@gmail.com Co-authored-by: sohamparikh sohamparikh47@gmail.com