Auto-Complete Example — NVIDIA Triton Inference Server (original) (raw)

This example shows how to implementauto_complete_configfunction in Python backend to providemax_batch_size,inputand outputproperties. These properties will allow Triton to load the Python model withMinimal Model Configurationin absence of a configuration file.

Themodel repositoryshould contain nobatch_auto_complete, andbatch_auto_complete models. The max_batch_size of nobatch_auto_complete model is set to zero, whereas the max_batch_size of batch_auto_completemodel is set to 4. For models with a non-zero value of max_batch_size, the configuration can specify a different value of max_batch_size as long as it does not exceed the value set in the model file.

Thenobatch_auto_complete andbatch_auto_complete models calculate the sum and difference of the INPUT0 and INPUT1 and put the results in OUTPUT0 and OUTPUT1respectively.

Deploying the Auto-Complete Models#

  1. Create the model repository:

mkdir -p models/nobatch_auto_complete/1/ mkdir -p models/batch_auto_complete/1/

Copy the Python models

cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py

Note that we don’t need a model configuration file since Triton will use the auto-complete model configuration provided in the Python model.

  1. Start the tritonserver:

tritonserver --model-repository pwd/models

Running inferences on Nobatch and Batch models:#

Send inference requests using client.py.

python3 examples/auto_complete/client.py

You should see an output similar to the output below:

'nobatch_auto_complete' configuration matches the expected auto complete configuration

'batch_auto_complete' configuration matches the expected auto complete configuration

PASS: auto_complete

The nobatch_model.py and batch_model.pymodel files are heavily commented with explanations about how to utilizeset_max_batch_size, add_input, and add_outputfunctions to setmax_batch_size, input and output properties of the model.

Explanation of the Client Output#

For each model, the client.py first requests the model configuration from Triton to validate if the model configuration has been registered as expected. The client then sends an inference request to verify whether the inference has run properly and the result is correct.