TensorRT - Get Started (original) (raw)

NVIDIA® TensorRT™ is an ecosystem of APIs for high-performance deep learning inference. The TensorRT inference library provides a general-purpose AI compiler and an inference runtime that deliver low latency and high throughput for production applications. TensorRT-LLM builds on top of TensorRT in an open-source Python API with large language model (LLM)-specific optimizations like in-flight batching and custom attention. TensorRT Model Optimizer provides state-of-the-art techniques like quantization and sparsity to reduce model complexity, enabling TensorRT, TensorRT-LLM, and other inference libraries to further optimize speed during deployment.

TensorRT 10.0 GA is a free download for members of the NVIDIA Developer Program.

Download Now Documentation

Ways to Get Started With NVIDIA TensorRT

TensorRT and TensorRT-LLM are available on multiple platforms for free for development. Simplify the deployment of AI models across cloud, data center, and GPU-accelerated workstations with NVIDIA NIM for generative AI, and NVIDIA Triton™ Inference Server for every workload, both part of NVIDIA AI Enterprise.

TensorRT

TensorRT is available to download for free as a binary on multiple different platforms or as a container on NVIDIA NGC™.

Download Now Pull Container From NGC Documentation

Beginner

Getting Started with NVIDIA TensorRT (video)
Introductory blog
Getting started notebooks (Jupyter Notebook)
Quick-start guide

Intermediate

Sample code (C++)
BERT, EfficientDet inference using TensorRT (Jupyter Notebook)
Serving model with NVIDIA Triton™ (blog, docs)

Expert

Using quantization aware training (QAT) with TensorRT (blog)
PyTorch-quantization toolkit (Python code)
TensorFlow quantization toolkit (blog)
Sparsity with TensorRT (blog)

TensorRT-LLM

TensorRT-LLM is available for free on GitHub.

Download Now Documentation

Beginner

TensorRT Model Optimizer

TensorRT Model Optimizer is available for free on NVIDIA PyPI, with examples and recipes on GitHub.

Download Now Documentation

Beginner

Ways to Get Started With NVIDIA TensorRT Frameworks

Torch-TensorRT and TensorFlow-TensorRT are available for free as containers on the NGC catalog or you can purchase NVIDIA AI Enterprise for mission-critical AI inference with enterprise-grade security, stability, manageability, and support. Contact sales or apply for a 90-day NVIDIA AI Enterprise evaluation license to get started.

Torch-TensorRT

Torch-TensorRT is available in the PyTorch container from the NGC catalog.

Pull Container From NGC Documentation

Beginner

Getting started with NVIDIA Torch-TensorRT (video)
Accelerate inference up to 6X in PyTorch (blog)
Object detection with SSD (Jupyter Notebook)

Intermediate

Post-training quantization with Hugging Face BERT (Jupyter Notebook)
Quantization aware training (Jupyter Notebook)
Serving model with Triton (blog, docs)
Using dynamic shapes (Jupyter Notebook)

TensorFlow-TensorRT

TensorFlow-TensorRT is available in the TensorFlow container from the NGC catalog.

Pull Container From NGC Documentation

Beginner

Getting started with TensorFlow-TensorRT (video)
Leverage TF-TRT Integration for Low-Latency Inference (blog)
Image classification with TF-TRT (video)
Quantization with TF-TRT (sample code)

Explore More TensorRT Resources

Conversational AI

Real-Time NLP With BERT (blog)
Optimizing T5 and GPT-2 (blog)
Quantize BERT with PTQ and QAT for INT8 Inference (sample)
ASR With TensorRT (Jupyter Notebook)
How to Deploy Real-Time TTS (blog)
NLU With BERT Notebook (Jupyter Notebook)
Real-Time Text-to-Speech (sample)
Building an RNN Network Layer by Layer (sample code)

Image and Vision

Optimize Object Detection (Jupyter Notebook)
Estimating Depth With ONNX Models and Custom Layers (blog)
Speeding Up Inference Using TensorFlow, ONNX, and TensorRT (blog)
Object Detection With EfficientDet, YOLOv3 Networks (Python code samples)
Using NVIDIA Ampere Architecture and TensorRT (blog)
Achieving FP32 Accuracy in INT8 using Quantization-Aware Training (blog)

Stay up to date on the latest inference news from NVIDIA.