Configuring a NIM — NVIDIA NIM for Large Language Models (LLMs) (original) (raw)

NVIDIA NIM for LLMs (NIM for LLMs) uses Docker containers under the hood. Each NIM is its own Docker container and there are several ways to configure it. Below is a full reference of all the ways to configure a NIM container.

GPU Selection#

Passing --gpus all to docker run is acceptable in homogeneous environments with one or more of the same GPU.

Note

--gpus all only works if your configuration has the same number of GPUs as specified for the model in the Supported Models. Running an inference on a configuration with fewer or more GPUs can result in a runtime error.

In heterogeneous environments with a combination of GPUs (for example: A6000 + a GeForce display GPU), workloads should only run on compute-capable GPUs. Expose specific GPUs inside the container using either:

the --gpus flag (ex: --gpus='"device=1"')
the environment variable NVIDIA_VISIBLE_DEVICES (ex: -e NVIDIA_VISIBLE_DEVICES=1)

The device ID(s) to use as input(s) are listed in the output of nvidia-smi -L:

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46) GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

Refer to the NVIDIA Container Toolkit documentation for more instructions.

How many GPUs do I need?#

Each Profile will have a TP (Tensor Parallelism) and PP (Pipeline Parallelism), decipherable through their readable name (example: tensorrt_llm-trtllm_buildable-bf16-tp8-pp2).

In most cases, you will need TP * PP amount of GPUs to run a specific profile.

For example, for the profile tensorrt_llm-trtllm_buildable-bf16-tp8-pp2 you will need either 2 nodes with 8 GPUs or 2 * 8 = 16 GPUs on one Node.

Shared memory flag#

Passing --shm-size=16GB to docker run is required when not using NVLink for multi-GPU setups. It is not required on SXM systems or when using profiles using only 1 GPU (e.g NIM_TENSOR_PARALLEL_SIZE=1).

Environment Variables#

Below is a reference for REQUIRED and No environment variables that can be passed into a NIM (-e added to docker run):

Volumes#

Local paths can be mounted to the following container paths.