Model Configuration Extension — NVIDIA Triton Inference Server (original) (raw)
This document describes Triton’s model configuration extension. The model configuration extension allows Triton to return server-specific information. Because this extension is supported, Triton reports “model_configuration” in the extensions field of its Server Metadata.
HTTP/REST#
In all JSON schemas shown in this document $number
, $string
, $boolean
,$object
and $array
refer to the fundamental JSON types. #optional indicates an optional JSON field.
Triton exposes the model configuration endpoint at the following URL. The versions portion of the URL is optional; if not provided Triton will return model configuration for the highest-numbered version of the model.
GET v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/config
A model configuration request is made with an HTTP GET to the model configuration endpoint.A successful model configuration request is indicated by a 200 HTTP status code. The model configuration response object, identified as $model_configuration_response
, is returned in the HTTP body for every successful request.
$model_configuration_response = {
configuration JSON
}
The contents of the response will be the JSON representation of the model’s configuration described by the ModelConfig message from model_config.proto.
A failed model configuration request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the$model_configuration_error_response
object.
$model_configuration_error_response = { "error": }
- “error” : The descriptive message for the error.
GRPC#
The GRPC definition of the service is:
service GRPCInferenceService { …
// Get model configuration. rpc ModelConfig(ModelConfigRequest) returns (ModelConfigResponse) {} }
Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure. The request and response messages for ModelConfig are:
message ModelConfigRequest { // The name of the model. string name = 1;
// The version of the model. If not given the version of the model // is selected automatically based on the version policy. string version = 2; }
message ModelConfigResponse { // The model configuration. ModelConfig config = 1; }
Where the ModelConfig message is defined inmodel_config.proto.