Parameters Extension — NVIDIA Triton Inference Server (original) (raw)

This document describes Triton’s parameters extension. The parameters extension allows an inference request to provide custom parameters that cannot be provided as inputs. Because this extension is supported, Triton reports “parameters” in the extensions field of its Server Metadata. This extension uses the optional “parameters” field in the KServe Protocol inHTTPandGRPC.

The following parameters are reserved for Triton’s usage and should not be used as custom parameters:

sequence_id
priority
timeout
sequence_start
sequence_end
headers
All the keys that start with "triton_" prefix. Some examples used today:
- "triton_enable_empty_final_response" request parameter
- "triton_final_response" response parameter

When using both GRPC and HTTP endpoints, you need to make sure to not use the reserved parameters list to avoid unexpected behavior. The reserved parameters are not accessible in the Triton C-API.

HTTP/REST#

The following example shows how a request can include custom parameters.

POST /v2/models/mymodel/infer HTTP/1.1 Host: localhost:8000 Content-Type: application/json Content-Length: { "parameters" : { "my_custom_parameter" : 42 } "inputs" : [ { "name" : "input0", "shape" : [ 2, 2 ], "datatype" : "UINT32", "data" : [ 1, 2, 3, 4 ] } ], "outputs" : [ { "name" : "output0", } ] }

GRPC#

The parameters field in the ModelInferRequest message can be used to send custom parameters.