BentoML Documentation (original) (raw)

View this page

Edit this page

Toggle table of contents sidebar

github_stars pypi_status actions_status documentation_status join_slack


BentoML is a Unified Inference Platform for deploying and scaling AI systems with any model, on any cloud.

Serve large language models with OpenAI-compatible APIs and vLLM inference backend.

LLM inference: vLLM

Protect your LLM API endpoint from harmful input using Google’s safety content moderation model.

LLM safety: ShieldGemma

Explore what developers are building with BentoML.

Overview

What is BentoML

BentoML is a Unified Inference Platform for deploying and scaling AI models with production-grade reliability, all without the complexity of managing infrastructure. It enables your developers to build AI systems 10x faster with custom models, scale efficiently in your cloud, and maintain complete control over security and compliance.

The architecture diagram of the BentoML unified inference platform

To get started with BentoML:

Recommend Python 3.9+

pip install bentoml

How-tos

Stay informed

The BentoML team uses the following channels to announce important updates like major product releases and share tutorials, case studies, as well as community news.

To receive release notifications, star and watch the BentoML project on GitHub. For release notes and detailed changelogs, see the Releases page.