NVIDIA Speech NIM Microservices Overview — NVIDIA Speech NIM Microservices (original) (raw)

NVIDIA Speech NIM microservices are GPU-accelerated Docker containers that provide speech AI capabilities as building blocks for your applications. Each NIM microservice packages a Nemotron model, the full NVIDIA inference stack (CUDA, TensorRT, Triton), and a unified API into a single container that you deploy, scale, and interact with through standard gRPC and HTTP interfaces.

You do not interact with models directly. Instead, each NIM microservice provides the API layer that your application calls to run inference on the containerized models.

NIM Microservices#

Each NIM microservice serves a Nemotron model and exposes it through a dedicated API.

You can deploy only the NIM microservices your application needs. Each runs as an independent Docker container with GPU acceleration. You select a specific model at deploy time by setting the CONTAINER_ID and NIM_TAGS_SELECTOR environment variables. For supported models and container IDs, refer to the Support Matrix.

To try each NIM microservice, visit build.nvidia.com.

Building Applications with Speech NIM Microservices#

Speech NIM microservices are building blocks. Your application sends requests to the NIM container APIs and receives results. The NIM handles model loading, GPU execution, batching, and streaming internally.

graph LR App("Your Application") -->|gRPC / HTTP| ASR("ASR NIM") App -->|gRPC / HTTP| TTS("TTS NIM") App -->|gRPC / HTTP| NMT("NMT NIM") style App fill:#ffffff,stroke:#000000,color:#000000 style ASR fill:#76b900,stroke:#000000,color:#000000 style TTS fill:#76b900,stroke:#000000,color:#000000 style NMT fill:#76b900,stroke:#000000,color:#000000 linkStyle default stroke:#76b900,stroke-width:2px

You can chain multiple NIM microservices together for complex pipelines. For example, a real-time translation application calls the ASR NIM to transcribe audio, passes the transcript to the NMT NIM for translation, and sends the translated text to the TTS NIM for speech synthesis. By integrating the NVIDIA Speech NIM microservices, you can orchestrate the data flow and build complex pipelines for end-to-end speech applications while scaling each NIM microservice independently.

Use Cases#

Next Steps#