CUDA Buffer Sharing Failure Between Triton and DeepStream Containers on WSL2 (original) (raw)
December 3, 2025, 10:30am 1
I’m trying to run inference over grpc between two docker containers on the same machine (Ubuntu 24.04 on WSL2).
One container runs Triton Inference Server, and the other runs DeepStream Python Apps. both images are deepstream:8.0-triton-multiarch. the app i’m running is deepstream-test3.
CUDA buffer sharing is enabled, but I encounter CUDA IPC errors on both sides.
i have set enable_cuda_buffer_sharing: true in config_triton_grpc_infer_primary_peoplenet.txt and i got the following errors:
Triton Side Errors: “failed to open CUDA IPC handle: invalid resource handle”
deepstream side Errors:
INFO: TritonGrpcBackend id:1 initialized for model: peoplenet
ERROR: Failed to register CUDA shared memory.
ERROR: Failed to set inference input: failed to register shared memory region: invalid args
ERROR: gRPC backend run failed to create request for model: peoplenet
According to the documentation, CUDA buffer sharing should work when both processes are on the same machine, so I am trying to identify what is wrong
• Hardware Platform: Geforce RTX 3050 6GB laptop
• DeepStream Version: deepstream 8.0-triton-multiarch
• JetPack Version (valid for Jetson only)
• TensorRT Version: 10.9.0.34-1+cuda12.8
• NVIDIA GPU Driver Version: Driver Version: 581.80
• Issue Type( questions, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
- Triton Container:
docker run --name triton_from_ds --gpus ‘“device=0”’ -it --rm -p8000:8000 -p8001:8001 -p8002:8002 --ipc=host --pid=host–shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mrow><mi>P</mi><mi>W</mi><mi>D</mi></mrow><mi mathvariant="normal">/</mi><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><msub><mi>l</mi><mi>r</mi></msub><mi>e</mi><mi>p</mi><mi>o</mi><mi>s</mi><mi>i</mi><mi>t</mi><mi>o</mi><mi>r</mi><mi>y</mi><mo>:</mo><mi mathvariant="normal">/</mi><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi><mi>s</mi><mo>−</mo><mi>v</mi></mrow><annotation encoding="application/x-tex">{PWD}/model_repository:/models -v </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span></span><span class="mord">/</span><span class="mord mathnormal">m</span><span class="mord mathnormal">o</span><span class="mord mathnormal">d</span><span class="mord mathnormal">e</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:-0.0197em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">e</span><span class="mord mathnormal">p</span><span class="mord mathnormal">os</span><span class="mord mathnormal">i</span><span class="mord mathnormal">t</span><span class="mord mathnormal" style="margin-right:0.03588em;">ory</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">:</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">/</span><span class="mord mathnormal">m</span><span class="mord mathnormal">o</span><span class="mord mathnormal">d</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">s</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span></span></span></span>{PWD}:/workspace deepstream-python-8.0 bash - download and export peoplenet to tensorrt as in the app readme deepstream_python_apps/apps/deepstream-test3/README at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub . i used the config.pbtxt provided.
- /model_repository/peoplenet/
* model.plan (exported as the provided readme link)
* config.pbtxt (same as vrepo)
- /model_repository/peoplenet/
- run server with:
tritonserver --model-repository=/models - deepstream container:
docker run --name deepstream --gpus ‘“device=0”’ -it --rm --network=host --ipc=host --pid=host --privileged --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-8.0 deepstream-python-8.0 bash - follow deepstream_python_apps/apps/deepstream-test3/README at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub to download poeple labels.txt. i have the same setup.
- in deep stream container:
cd /opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/apps/deepstream-test3python3 deepstream_test_3.py
-i file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_office.mp4
–pgie nvinferserver-grpc
-c config_triton_grpc_infer_primary_peoplenet.txt
note: deepstream-python-8.0 is just deepstream 8.0-triton-multiarch with python bindings installed. you can build the same image with this dockerfile
# Base image
FROM nvcr.io/nvidia/deepstream:8.0-triton-multiarch
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV DS_PYTHON_APPS_PATH=/opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps
# 1. Install System Dependencies
RUN apt-get update && apt-get install -y \
python3-gi \
python3-dev \
python3-gst-1.0 \
python3-venv \
python3-pip \
git \
wget \
libgstrtspserver-1.0-0 \
gstreamer1.0-rtsp \
libgirepository1.0-dev \
gobject-introspection \
gir1.2-gst-rtsp-server-1.0 \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
# 2. Clone DeepStream Python Apps Repository
WORKDIR /opt/nvidia/deepstream/deepstream/sources/
RUN git clone
# 3. Setup Virtual Environment (with system packages)
# We create it in a standard location
ENV VIRTUAL_ENV=/opt/pyds
RUN python3 -m venv --system-site-packages $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# 4. Install Python Dependencies
RUN pip3 install cuda-python
# 5. Download and Install Prebuilt Bindings Wheel
# download the specific wheel for DeepStream 8.0 / Python 3.12 / x86_64
# Note: Update the URL if a newer version is released.
WORKDIR $DS_PYTHON_APPS_PATH
RUN wget https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/releases/download/v1.2.2/pyds-1.2.2-cp312-cp312-linux_x86_64.whl \
&& pip3 install pyds-1.2.2-cp312-cp312-linux_x86_64.whl \
&& rm pyds-1.2.2-cp312-cp312-linux_x86_64.whl
# 6. Set Working Directory to Test App 1
WORKDIR $DS_PYTHON_APPS_PATH/apps/deepstream-test1
CMD ["/bin/bash"]
fanzh December 4, 2025, 2:07am 3
Please refer to pre-requisites for WSL. Driver version should be 572.60 for GeForce RTX-3050.
fanzh December 8, 2025, 9:34am 4
Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!
yes the issue remains. i have download and installed driver version 572.60
+-----------------------------------------------------------------------------------------+| NVIDIA-SMI 570.124.03 Driver Version: 572.60 CUDA Version: 12.8 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 NVIDIA GeForce RTX 3050 ... On | 00000000:01:00.0 Off | N/A || N/A 40C P0 9W / 75W | 0MiB / 6144MiB | 0% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+
inside the container i run nvcc –version, and it is indeed 12.8 also.
but when i set enable_cuda_buffer_sharing to true i get the error
ERROR: Failed to register CUDA shared memory.ERROR: Failed to set inference input: failed to register shared memory region: invalid args
and on triton side
E1215 08:50:32.875460 1349 shared_memory_manager.cc:260] "failed to open CUDA IPC handle: invalid resource handle"
i run both containers with the same commands as earlier.
fanzh December 16, 2025, 2:47am 6
did you start two docker container in the same WSL virtual machine?
Yes, two docker containers running in WSL Ubuntu virual machine on my laptop.
fanzh December 17, 2025, 2:36pm 8
As written in this topic, CUDA Shared memory is not supported on Windows yet. In the latest triton windows release, this function is still not supported.