vLLM Blog (original) (raw)
- Apr 23, 2025
Accelerating RLHF with vLLM, Best Practice from OpenRLHF
- Apr 11, 2025
Transformers backend integration in vLLM
- Apr 5, 2025
Llama 4 in vLLM
- Feb 24, 2025
PTPC-FP8: Boosting vLLM Performance on AMD ROCm
- Feb 21, 2025
Introducing AIBrix: A Scalable, Cost-Effective Control Plane for vLLM
- Feb 17, 2025
Distributed Inference with vLLM
- Jan 27, 2025
vLLM V1: A Major Upgrade to vLLM's Core Architecture
- Jan 27, 2025
Introducing vLLM Inference Provider in Llama Stack
- Jan 21, 2025
High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”
- Jan 14, 2025
Structured Decoding in vLLM: a gentle introduction
- Jan 10, 2025
vLLM 2024 Retrospective and 2025 Vision
- Jan 10, 2025
Installing and Developing vLLM with Ease
- Oct 23, 2024
Serving LLMs on AMD MI300X: Best Practices
- Oct 17, 2024
How Speculative Decoding Boosts vLLM Performance by up to 2.8x
- Sep 5, 2024
vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction
- Jul 25, 2024
vLLM’s Open Governance and Performance Roadmap
- Jul 23, 2024
Announcing Llama 3.1 Support in vLLM
- Nov 14, 2023
Notes on vLLM v.s. DeepSpeed-FastGen
- Jun 20, 2023