Model Configuration Extension — NVIDIA Triton Inference Server (original) (raw)

This document describes Triton’s model configuration extension. The model configuration extension allows Triton to return server-specific information. Because this extension is supported, Triton reports “model_configuration” in the extensions field of its Server Metadata.

HTTP/REST#

In all JSON schemas shown in this document $number, $string, $boolean,$object and $array refer to the fundamental JSON types. #optional indicates an optional JSON field.

Triton exposes the model configuration endpoint at the following URL. The versions portion of the URL is optional; if not provided Triton will return model configuration for the highest-numbered version of the model.

GET v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/config

A model configuration request is made with an HTTP GET to the model configuration endpoint.A successful model configuration request is indicated by a 200 HTTP status code. The model configuration response object, identified as $model_configuration_response, is returned in the HTTP body for every successful request.

$model_configuration_response = {

configuration JSON

}

The contents of the response will be the JSON representation of the model’s configuration described by the ModelConfig message from model_config.proto.

A failed model configuration request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the$model_configuration_error_response object.

$model_configuration_error_response = { "error": }

GRPC#

The GRPC definition of the service is:

service GRPCInferenceService { …

// Get model configuration. rpc ModelConfig(ModelConfigRequest) returns (ModelConfigResponse) {} }

Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure. The request and response messages for ModelConfig are:

message ModelConfigRequest { // The name of the model. string name = 1;

// The version of the model. If not given the version of the model // is selected automatically based on the version policy. string version = 2; }

message ModelConfigResponse { // The model configuration. ModelConfig config = 1; }

Where the ModelConfig message is defined inmodel_config.proto.