NVIDIA NIM Operator — NVIDIA NIM Operator (original) (raw)

About the Operator#

The NVIDIA NIM Operator enables Kubernetes cluster administrators to operate the software components and services necessary to run NVIDIA NIMs in various domains such as reasoning, retrieval, speech, and biology. Additionally, it allows the use of NeMo Microservices to fine-tune, evaluate, or apply guardrails to your models.

The Operator manages the life cycle of the following microservices and the models they use:

Benefits of Using the Operator#

Using the NIM Operator simplifies the operation and lifecycle management of NIM and NeMo microservices at scale and at the cluster level. Custom resources simplify the deployment and lifecycle management of multiple AI inference pipelines, such as RAG and multiple LLM inferences. Additionally, the NIM Operator supports caching models to reduce the initial inference latency and enable auto-scaling.

The Operator uses the following custom resources:

Sample Applications#

NVIDIA provides the following sample applications and tutorials for you to explore the NIM Operator and supported workflows.

Licenses#

The following table identifies the licenses for the software components related to the Operator.

Third Party Software#

The Chain Server that you can deploy with the sample pipeline uses third party software. You can download the Third Party Licenses.