Support Matrix — Model Optimizer 0.27.1 (original) (raw)

Feature Support Matrix

Linux

Quantization Format Details Supported Model Formats Deployment
FP4 Per-Block FP4 Weight & Activations GPUs: Blackwell and Later PyTorch TensorRT, TensorRT-LLM
FP8 Per-Tensor FP8 Weight & Activations GPUs: Ada and Later PyTorch, ONNX* TensorRT*, TensorRT-LLM
INT8 Per-channel INT8 Weights, Per-Tensor INT8 Activations Uses Smooth Quant Algorithm GPUs: Ampere and Later PyTorch, ONNX* TensorRT*, TensorRT-LLM
W4A16 (INT4 Weights Only) Block-wise INT4 Weights, F16 Activations Uses AWQ Algorithm GPUs: Ampere and Later PyTorch, ONNX TensorRT, TensorRT-LLM
W4A8 (INT4 Weights, FP8 Activations) Block-wise INT8 Weights, Per-Tensor FP8 Activations Uses AWQ Algorithm GPUs: Ada and Later PyTorch*, ONNX* TensorRT-LLM

Windows

Quantization Format Details Supported Model Formats Deployment
W4A16 (INT4 Weights Only) Block-wise INT4 Weights, F16 Activations Uses AWQ Algorithm GPUs: Ampere and Later PyTorch*, ONNX ORT-DirectML, TensorRT*, TensorRT-LLM*
W4A8 (INT4 Weights, FP8 Activations) Block-wise INT8 Weights, Per-Tensor FP8 Activations Uses AWQ Algorithm GPUs: Ada and Later PyTorch* TensorRT-LLM*
FP8 Per-Tensor FP8 Weight & Activations (PyTorch) Per-Tensor Activation and Per-Channel Weights quantization (ONNX) Uses Max calibration GPUs: Ada and Later PyTorch*, ONNX TensorRT*, TensorRT-LLM*, ORT-CUDA
INT8 Per-Channel INT8 Weights, Per-Tensor INT8 Activations Uses Smooth Quant (PyTorch)*, Max calibration (ONNX) GPUs: Ada and Later PyTorch*, ONNX TensorRT*, TensorRT-LLM*, ORT-CUDA

Note

Features marked with an asterisk (*) are considered experimental.

Model Support Matrix

Linux

Please checkout the model support matrix here.

Windows

Model ONNX INT4 AWQ (W4A16) ONNX INT8 Max (W8A8) ONNX FP8 Max (W8A8)
Llama3.1-8B-Instruct Yes No No
Phi3.5-mini-Instruct Yes No No
Mistral-7B-Instruct-v0.3 Yes No No
Llama3.2-3B-Instruct Yes No No
Gemma-2b-it Yes No No
Gemma-2-2b Yes No No
Gemma-2-9b Yes No No
Nemotron Mini 4B Instruct Yes No No
Qwen2.5-7B-Instruct Yes No No
DeepSeek-R1-Distill-Llama-8B Yes No No
DeepSeek-R1-Distil-Qwen-1.5B Yes No No
DeepSeek-R1-Distil-Qwen-7B Yes No No
DeepSeek-R1-Distill-Qwen-14B Yes No No
Mistral-NeMo-Minitron-2B-128k-Instruct Yes No No
Mistral-NeMo-Minitron-4B-128k-Instruct Yes No No
Mistral-NeMo-Minitron-8B-128k-Instruct Yes No No
whisper-large No Yes Yes
sam2-hiera-large No Yes Yes

Note