NVIDIA NIM Operator — NVIDIA NIM Operator (original) (raw)

About the Operator#

The NVIDIA NIM Operator enables Kubernetes cluster administrators to operate the software components and services necessary to run NVIDIA NIMs in various domains such as reasoning, retrieval, speech, and biology. Additionally, it allows the use of NeMo Microservices to fine-tune, evaluate, or apply guardrails to your models.

The Operator manages the life cycle of the following microservices and the models they use:

NVIDIA NIM models, such as:
- Reasoning LLMs
- Retrieval - Embedding, Reranking, etc.
- Speech
- Biology
NeMo core microservices:
- NeMo Customizer
- NeMo Evaluator
- NeMo Guardrails
NeMo platform component microservices:
- NeMo Data Store
- NeMo Entity Store

Benefits of Using the Operator#

Using the NIM Operator simplifies the operation and lifecycle management of NIM and NeMo microservices at scale and at the cluster level. Custom resources simplify the deployment and lifecycle management of multiple AI inference pipelines, such as RAG and multiple LLM inferences. Additionally, the NIM Operator supports caching models to reduce the initial inference latency and enable auto-scaling.

The Operator uses the following custom resources:

nimcaches.apps.nvidia.com
This custom resource enables downloading models from NVIDIA NGC and persisting them on network storage. One advantage to caching a model is that when multiple instances of the same NIM microservice start, the microservices use the single cached model. However, caching is optional. Without caching, each NIM microservice instance downloads a copy of the model when it starts.
nimservices.apps.nvidia.com
This custom resource represents a NIM microservice. Adding and updating a NIM service resource creates a Kubernetes deployment for the microservice in a namespace.
The custom resource supports using a model from a existing NIM cache resource or a persistent volume claim.
The custom resource also supports creating a horizontal pod autoscaler, ingress, and service monitor to simplify cluster administration.
nimpipelines.apps.nvidia.com
This custom resource represents a group of NIM service custom resources.
nemodatastore.apps.nvidia.com, nemoentitystore.apps.nvidia.com, nemocustomizer.apps.nvidia.com, nemoevaluator.apps.nvidia.com, nemoguardrails.apps.nvidia.com
These microservices represent a NeMo Platform microservices that provide a flexible foundation for building AI workflows on your Kubernetes cluster on-prem or in cloud.

Sample Applications#

NVIDIA provides the following sample applications and tutorials for you to explore the NIM Operator and supported workflows.

Sample multi-turn RAG pipeline that deploys a chat bot web application and a chain server. The chain server communicates with the NIM microservices and a vector database.
Data Flywheel with Jupyter notebook.

Licenses#

The following table identifies the licenses for the software components related to the Operator.

Third Party Software#

The Chain Server that you can deploy with the sample pipeline uses third party software. You can download the Third Party Licenses.