vLLM - vLLM (original) (raw)

Welcome to vLLM

vLLM

Easy, fast, and cheap LLM serving for everyone

Star Watch Fork

vLLM is a fast and easy-to-use library for LLM inference and serving.

Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

vLLM is fast with:

vLLM is flexible and easy to use with:

For more information, check out the following: