Welcome to vLLM — vLLM (original) (raw)

Welcome to vLLM#

vLLM

Easy, fast, and cheap LLM serving for everyone

Star Watch Fork

vLLM is a fast and easy-to-use library for LLM inference and serving.

Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

vLLM is fast with:

vLLM is flexible and easy to use with:

For more information, check out the following:

Documentation#

Models

Features

Deployment

Design Documents

V1 Design Documents

Developer Guide

API Reference

Indices and tables#