CUDA Buffer Sharing Failure Between Triton and DeepStream Containers on WSL2

December 3, 2025, 10:30am 1

I’m trying to run inference over grpc between two docker containers on the same machine (Ubuntu 24.04 on WSL2).

One container runs Triton Inference Server, and the other runs DeepStream Python Apps. both images are deepstream:8.0-triton-multiarch. the app i’m running is deepstream-test3.

CUDA buffer sharing is enabled, but I encounter CUDA IPC errors on both sides.

i have set enable_cuda_buffer_sharing: true in config_triton_grpc_infer_primary_peoplenet.txt and i got the following errors:

Triton Side Errors: “failed to open CUDA IPC handle: invalid resource handle”

deepstream side Errors:

INFO: TritonGrpcBackend id:1 initialized for model: peoplenet
ERROR: Failed to register CUDA shared memory.
ERROR: Failed to set inference input: failed to register shared memory region: invalid args
ERROR: gRPC backend run failed to create request for model: peoplenet

According to the documentation, CUDA buffer sharing should work when both processes are on the same machine, so I am trying to identify what is wrong

• Hardware Platform: Geforce RTX 3050 6GB laptop
• DeepStream Version: deepstream 8.0-triton-multiarch
• JetPack Version (valid for Jetson only)
• TensorRT Version: 10.9.0.34-1+cuda12.8
• NVIDIA GPU Driver Version: Driver Version: 581.80
• Issue Type( questions, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Triton Container: docker run --name triton_from_ds --gpus ‘“device=0”’ -it --rm -p8000:8000 -p8001:8001 -p8002:8002 --ipc=host --pid=host
–shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v <math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mrow><mi>P</mi><mi>W</mi><mi>D</mi></mrow><mi mathvariant="normal">/</mi><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><msub><mi>l</mi><mi>r</mi></msub><mi>e</mi><mi>p</mi><mi>o</mi><mi>s</mi><mi>i</mi><mi>t</mi><mi>o</mi><mi>r</mi><mi>y</mi><mo>:</mo><mi mathvariant="normal">/</mi><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi><mi>s</mi><mo>−</mo><mi>v</mi></mrow><annotation encoding="application/x-tex">{PWD}/model_repository:/models -v </annotation></semantics></math>PWD/modelrepository:/models−v{PWD}:/workspace deepstream-python-8.0 bash
download and export peoplenet to tensorrt as in the app readme deepstream_python_apps/apps/deepstream-test3/README at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub . i used the config.pbtxt provided.
- /model_repository/peoplenet/
  * model.plan (exported as the provided readme link)
  * config.pbtxt (same as vrepo)
run server with: tritonserver --model-repository=/models
deepstream container: docker run --name deepstream --gpus ‘“device=0”’ -it --rm --network=host --ipc=host --pid=host --privileged --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-8.0 deepstream-python-8.0 bash
follow deepstream_python_apps/apps/deepstream-test3/README at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub to download poeple labels.txt. i have the same setup.
in deep stream container:
- cd /opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/apps/deepstream-test3
- python3 deepstream_test_3.py
  -i file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_office.mp4
  –pgie nvinferserver-grpc
  -c config_triton_grpc_infer_primary_peoplenet.txt

note: deepstream-python-8.0 is just deepstream 8.0-triton-multiarch with python bindings installed. you can build the same image with this dockerfile

# Base image
FROM nvcr.io/nvidia/deepstream:8.0-triton-multiarch

# Set environment variables

ENV DEBIAN_FRONTEND=noninteractive

ENV DS_PYTHON_APPS_PATH=/opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps

# 1. Install System Dependencies

RUN apt-get update && apt-get install -y \

    python3-gi \

    python3-dev \

    python3-gst-1.0 \

    python3-venv \

    python3-pip \

    git \

    wget \

    libgstrtspserver-1.0-0 \

    gstreamer1.0-rtsp \

    libgirepository1.0-dev \

    gobject-introspection \

    gir1.2-gst-rtsp-server-1.0 \

    ffmpeg \

    && rm -rf /var/lib/apt/lists/*

# 2. Clone DeepStream Python Apps Repository

WORKDIR /opt/nvidia/deepstream/deepstream/sources/

RUN git clone 


# 3. Setup Virtual Environment (with system packages)

# We create it in a standard location

ENV VIRTUAL_ENV=/opt/pyds

RUN python3 -m venv --system-site-packages $VIRTUAL_ENV

ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# 4. Install Python Dependencies

RUN pip3 install cuda-python

# 5. Download and Install Prebuilt Bindings Wheel

# download the specific wheel for DeepStream 8.0 / Python 3.12 / x86_64

# Note: Update the URL if a newer version is released.

WORKDIR $DS_PYTHON_APPS_PATH

RUN wget https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/releases/download/v1.2.2/pyds-1.2.2-cp312-cp312-linux_x86_64.whl \

    && pip3 install pyds-1.2.2-cp312-cp312-linux_x86_64.whl \

    && rm pyds-1.2.2-cp312-cp312-linux_x86_64.whl

# 6. Set Working Directory to Test App 1

WORKDIR $DS_PYTHON_APPS_PATH/apps/deepstream-test1

CMD ["/bin/bash"]

fanzh December 4, 2025, 2:07am 3

Please refer to pre-requisites for WSL. Driver version should be 572.60 for GeForce RTX-3050.

fanzh December 8, 2025, 9:34am 4

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

yes the issue remains. i have download and installed driver version 572.60

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03 Driver Version: 572.60 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 ... On | 00000000:01:00.0 Off | N/A |
| N/A 40C P0 9W / 75W | 0MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

inside the container i run nvcc –version, and it is indeed 12.8 also.

but when i set enable_cuda_buffer_sharing to true i get the error

ERROR: Failed to register CUDA shared memory.
ERROR: Failed to set inference input: failed to register shared memory region: invalid args

and on triton side

E1215 08:50:32.875460 1349 shared_memory_manager.cc:260] "failed to open CUDA IPC handle: invalid resource handle"

i run both containers with the same commands as earlier.

fanzh December 16, 2025, 2:47am 6

did you start two docker container in the same WSL virtual machine?

Yes, two docker containers running in WSL Ubuntu virual machine on my laptop.

fanzh December 17, 2025, 2:36pm 8

As written in this topic, CUDA Shared memory is not supported on Windows yet. In the latest triton windows release, this function is still not supported.

CUDA Buffer Sharing Failure Between Triton and DeepStream Containers on WSL2 (original) (raw)