GitHub - pytorch/serve: Serve, optimize and scale PyTorch models in production (original) (raw)

⚠️ Notice: Limited Maintenance

This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.

❗ANNOUNCEMENT: Security Changes❗

TorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information: Token Authorization, Model API control

TorchServe

Nightly build Docker Nightly build Benchmark Nightly Docker Regression Nightly KServe Regression Nightly Kubernetes Regression Nightly

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8

curl http://127.0.0.1:8080/predictions/bert -T input.txt

🚀 Quick start with TorchServe

Install dependencies

python ./ts_scripts/install_dependencies.py

Include dependencies for accelerator support with the relevant optional flags

python ./ts_scripts/install_dependencies.py --rocm=rocm61 python ./ts_scripts/install_dependencies.py --cuda=cu121

Latest release

pip install torchserve torch-model-archiver torch-workflow-archiver

Nightly build

pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

🚀 Quick start with TorchServe (conda)

Install dependencies

python ./ts_scripts/install_dependencies.py

Include depeendencies for accelerator support with the relevant optional flags

python ./ts_scripts/install_dependencies.py --rocm=rocm61 python ./ts_scripts/install_dependencies.py --cuda=cu121

Latest release

conda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver

Nightly build

conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver

Getting started guide

🐳 Quick Start with Docker

Latest release

docker pull pytorch/torchserve

Nightly build

docker pull pytorch/torchserve-nightly

Refer to torchserve docker for details.

🤖 Quick Start LLM Deployment

VLLM Engine

Make sure to install torchserve with pip or conda as described above and login with huggingface-cli login

python -m ts.llm_launcher --model_id meta-llama/Llama-3.2-3B-Instruct --disable_token_auth

Try it out

curl -X POST -d '{"model":"meta-llama/Llama-3.2-3B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"

TRT-LLM Engine

Make sure to install torchserve with python venv as described above and login with huggingface-cli login

pip install -U --use-deprecated=legacy-resolver -r requirements/trt_llm.txt

python -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3.1-8B-Instruct --engine trt_llm --disable_token_auth

Try it out

curl -X POST -d '{"prompt":"count from 1 to 9 in french ", "max_tokens": 100}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"

🚢 Quick Start LLM Deployment with Docker

#export token= docker build --pull . -f docker/Dockerfile.vllm -t ts/vllm

docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth

Try it out

curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"

Refer to LLM deployment for details and other methods.

⚡ Why TorchServe

🤔 How does TorchServe work

🏆 Highlighted Examples

For more examples

🛡️ TorchServe Security Policy

SECURITY.md

🤓 Learn More

https://pytorch.org/serve

🫂 Contributing

We welcome all contributions!

To learn more about how to contribute, see the contributor guide here.

📰 News

💖 All Contributors

Made with contrib.rocks.

⚖️ Disclaimer

This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to opensource@fb.com. For questions directed at Amazon, please send an email to torchserve@amazon.com. For all other questions, please open up an issue in this repository here.

TorchServe acknowledges the Multi Model Server (MMS) project from which it was derived