Torch-TensorRT — Torch-TensorRT v2.8.0.dev0+ee32da0 documentation (original) (raw)
In-framework compilation of PyTorch inference code for NVIDIA GPUs¶
Torch-TensorRT is a inference compiler for PyTorch, targeting NVIDIA GPUs via NVIDIA’s TensorRT Deep Learning Optimizer and Runtime. It supports both just-in-time (JIT) compilation workflows via the torch.compile
interface as well as ahead-of-time (AOT) workflows. Torch-TensorRT integrates seamlessly into the PyTorch ecosystem supporting hybrid execution of optimized TensorRT code with standard PyTorch code.
More Information / System Architecture:
Getting Started¶
User Guide¶
- Torch-TensorRT Explained
- Dynamic shapes with Torch-TensorRT
- Post Training Quantization (PTQ)
- Saving models compiled with Torch-TensorRT
- Deploying Torch-TensorRT Programs
- DLA
- Compile Mixed Precision models with Torch-TensorRT
Tutorials¶
- Torch Compile Advanced Usage
- Deploy Quantized Models using Torch-TensorRT
- Engine Caching
- Engine Caching (BERT)
- Refitting Torch-TensorRT Programs with New Weights
- Serving a Torch-TensorRT model with Triton
- Torch Export with Cudagraphs
- Overloading Torch-TensorRT Converters with Custom Converters
- Using Custom Kernels within TensorRT Engines with Torch-TensorRT
- Automatically Generate a Converter for a Custom Kernel
- Automatically Generate a Plugin for a Custom Kernel
- Mutable Torch TensorRT Module
- Weight Streaming
- Pre-allocated output buffer
Dynamo Frontend¶
TorchScript Frontend¶
- Creating a TorchScript Module
- Using Torch-TensorRT in Python
- Using Torch-TensorRT in C++
- Using Torch-TensorRT TorchScript Frontend Directly From PyTorch
FX Frontend¶
Model Zoo¶
- Compiling ResNet with dynamic shapes using the torch.compile backend
- Compiling BERT using the torch.compile backend
- Compiling Stable Diffusion model using the torch.compile backend
- Compiling GPT2 using the Torch-TensorRT torch.compile frontend
- Compiling GPT2 using the dynamo backend
- Compiling Llama2 using the dynamo backend
- Compiling SAM2 using the dynamo backend
- Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend
- Legacy notebooks
Python API Documentation¶
- torch_tensorrt
- torch_tensorrt.dynamo
- torch_tensorrt.logging
- torch_tensorrt.fx
- torch_tensorrt.ts
- torch_tensorrt.ts.ptq
C++ API Documentation¶
- Namespace torch_tensorrt
- Namespace torch_tensorrt::logging
- Namespace torch_tensorrt::ptq
- Namespace torch_tensorrt::torchscript
CLI Documentation¶
Contributor Documentation¶
- System Overview
- Writing Dynamo Converters
- Writing Dynamo ATen Lowering Passes
- Writing TorchScript Converters
- Useful Links for Torch-TensorRT Development