GitHub - triton-inference-server/tutorials: This repository contains tutorials and examples for Triton Inference Server (original) (raw)

Triton Tutorials

For users experiencing the "Tensor in" & "Tensor out" approach to Deep Learning Inference, getting started with Triton can lead to many questions. The goal of this repository is to familiarize users with Triton's features and provide guides and examples to ease migration. For a feature by feature explanation, refer to the Triton Inference Server documentation.

Getting Started Checklist

Overview Video Conceptual Guide: Deploying Models

Quick Deploy

The focus of these examples is to demonstrate deployment for models trained with various frameworks. These are quick demonstrations made with an understanding that the user is somewhat familiar with Triton.

Deploy a ...

PyTorch Model TensorFlow Model ONNX Model TensorRT Accelerated Model vLLM Model OpenVINO Model

LLM Tutorials

The table below contains some popular models that are supported in our tutorials

Example Models Tutorial Link
Llama-2-7B TensorRT-LLM Tutorial
Persimmon-8B HuggingFace Transformers Tutorial
Falcon-7B HuggingFace Transformers Tutorial
LLaVA-v1.5-7B TensorRT-LLM Tutorial

**Note:**This is not an exhausitive list of what Triton supports, just what is included in the tutorials.

What does this repository contain?

This repository contains the following resources:

The Triton Inference Server GitHub organization contains multiple repositories housing different features of the Triton Inference Server. The following is not a complete description of all the repositories, but just a simple guide to build intuitive understanding.

Adding Requests

Open an issue and specify details for adding a request for an example. Want to make a contribution? Open a pull request and tag an Admin.