C API Description — NVIDIA Triton Inference Server (original) (raw)

Triton server functionality is encapsulated in a shared library which is built from source contained in the core repository. You can include the full capabilities of Triton by linking the shared library into your application and by using the C API defined intritonserver.h.

When you link the Triton shared library into your application you are_not_ spawning a separate Triton process, instead, you are including the Triton core logic directly in your application. The Triton HTTP/REST or GRPC protocols are not used to communicate with this Triton core logic, instead all communication between your application and the Triton core logic must take place via the Server API.

The top-level abstraction used by Server API is TRITONSERVER_Server, which represents the Triton core logic that is capable of implementing all of the features and capabilities of Triton. ATRITONSERVER_Server object is created by callingTRITONSERVER_ServerNew with a set of options that indicate how the object should be initialized. Use of TRITONSERVER_ServerNew is demonstrated in simple.cc. Once you have created aTRITONSERVER_Server object, you can begin using the rest of the Server API as described below.

Error Handling#

Most Server API functions return an error object indicating success or failure. Success is indicated by return nullptr (NULL). Failure is indicated by returning a TRITONSERVER_Error object. The error code and message can be retrieved from a TRITONSERVER_Error object withTRITONSERVER_ErrorCode and TRITONSERVER_ErrorMessage.

The lifecycle and ownership of all Server API objects is documented intritonserver.h. ForTRITONSERVER_Error, ownership of the object passes to the caller of the Server API function. As a result, your application is responsible for managing the lifecycle of the returned TRITONSERVER_Errorobject. You must delete the error object usingTRITONSERVER_ErrorDelete when you are done using it. Macros such asFAIL_IF_ERR shown in common.h are useful for managing error object lifetimes.

Versioning and Backwards Compatibility#

A typical pattern, demonstrated in simple.cc and shown below, shows how you can compare the Server API version provided by the shared library against the Server API version that you compiled your application against. The Server API is backwards compatible, so as long as the major version provided by the shared library matches the major version that you compiled against, and the minor version provided by the shared library is greater-than-or-equal to the minor version that you compiled against, then your application can use the Server API.

#include "tritonserver.h" // Error checking removed for clarity... uint32_t api_version_major, api_version_minor; TRITONSERVER_ApiVersion(&api_version_major, &api_version_minor); if ((TRITONSERVER_API_VERSION_MAJOR != api_version_major) || (TRITONSERVER_API_VERSION_MINOR > api_version_minor)) { // Error, the shared library implementing the Server API is older than // the version of the Server API that you compiled against. }

Non-Inference APIs#

The Server API contains functions for checking health and readiness, getting model information, getting model statistics and metrics, loading and unloading models, etc. The use of these functions is straightforward and some of these functions are demonstrated insimple.cc and all are documented intritonserver.h.

Inference APIs#

Performing an inference request requires the use of many Server API functions and objects, as demonstrated insimple.cc. The general usage requires the following steps.

A simple example using the C API can be found insimple.cc. A more complicated example can be found in the source that implements the HTTP/REST and GRPC endpoints for Triton. These endpoints use the C API to communicate with the core of Triton. The primary source files for the endpoints aregrpc_server.cc andhttp_server.cc.