Deepstream efficient pipeline for tiling (original) (raw)
November 29, 2025, 6:15pm 1
Hi,
In a production Orin Nano 8GB I have a Deepstream pipeline like below to process images cactured at full resolution 1920x1200. I have working in Python with Pyds. everithing working fine.
Now, I need to:
1- Split the full image in tiling of aprox 224x224 pixels (5 rows x 8 columns = 40 tiles)
2 - Process each tile independently trough the neural network to get the 40 inferences
3 - Get also the 40 tiles as images compress to jpeg
I expect to get more inference time, but not 40 times more.
What is the most efficient way to implement this modification?
I am considering these alternatives:
A) Config nvifer as secongary and artificially inyect the boundiboxes using probes before the mux
B) Using nvdspreprocess to extract the tiles
C) develop a custom gstreamer pluging to do the tiling
Are all this options feasible?
Which one makes more sense?
Any other alternative?
Deepstream pipeline:
gst-launch-1.0 nvarguscamerasrc sensor_id=0 \
! “video/x-raw(memory:NVMM)” ! nvvideoconvert src-crop=“0:0:1920:1200” ! “video/x-raw(memory:NVMM),format=NV12,width=1920,height=1200” ! \
! nvv4l2decoder name=decoder_rtsp0 ! nvvideoconvert name=crop0 \
! capsfilter name=crop_caps0 caps=“video/x-raw,width=1920,height=1200” \
! tee name=tee0 \
tee0. ! queue name=infer_queue0 ! mux.sink_0 nvstreammux name=mux width=1920 height=1200 batch-size=1 batched-push-timeout=40000 ! nvinfer config-file-path=‘…./deploy/nvinfer_config.txt’ ! nvstreamdemux name=demux demux.src_0 ! appsink name=infer_appsink0 \
tee0. ! queue name=image_queue0 ! nvvideoconvert name=convert0 ! jpegenc name=jpegenc0 ! appsink name=image_appsink0
Are the 40 inferences the same model or 40 different models?
If each tile should be inferred by the same model, they can be put into batch. Take the 2x2 = 4 tiles case as the example
gst-launch-1.0 v4l2src device=/dev/video0 ! 'video/x-raw,width=1280,height=720,framerate=10/1,format=YUY2' ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=NV12' ! tee name=t t.src_0 ! queue ! nvvideoconvert src-crop="0:0:640:360" ! mux.sink_0 nvstreammux name=mux width=640 height=360 live-source=1 batch-size=4 ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt model-engine-file=/opt/nvidia/deepstream/deepstream/samples/models/Primary_Detector/resnet18_trafficcamnet_pruned.onnx_b4_gpu0_fp16.engine batch-size=4 ! nvstreamdemux name=demux demux.src_0 ! queue ! nvdsosd ! nvvideoconvert ! jpegenc ! fakesink t.src_1 ! queue ! nvvideoconvert src-crop="640:0:640:360" ! mux.sink_1 t.src_2 ! queue ! nvvideoconvert src-crop="0:360:640:360" ! mux.sink_2 t.src_3 ! queue ! nvvideoconvert src-crop="640:360:640:360" ! mux.sink_3 demux.src_1 ! queue ! nvdsosd ! nvvideoconvert ! jpegenc ! fakesink demux.src_2 ! queue ! nvdsosd ! nvvideoconvert ! jpegenc ! fakesink demux.src_3 ! queue ! nvdsosd ! nvvideoconvert ! jpegenc ! fakesink
Thanks Fiona for your quick response.
Looks good solution. I hope when I reproduce for 40 tiles it does not produce any memory (or any other type) problem.
I was inspired by Applying Inference over Specific Frame Regions with NVIDIA DeepStream | NVIDIA Technical Blog to use the DeepStream nvdspreprocess plugin.
Still I and working on it and having problems because it seems pyds v1.2.0 (I have DeepStream v7.1) do not have pyds.NvDsMetaType.NVDS_PREPROCESS_BATCH_META neither pyds.GstNvDsPreProcessBatchMeta. Any help here?
Do you have any estimation on your solution (Tiling with nvvideoconvert crop) performance vs tiling with nvdspreprocess?
Any other hint?
The DeepStream nvdspreprocess plugin does not meet your requirement. Please take some time to read the source code of gst-nvdspreprocess, it is a “in-place” transform plugin, the output is the same to the input. Your requirement needs the outputs to be parts of the input.
Can you tell us why you need this metadata be exported?
Please refer to the pipeline I provided, the settings tried to avoid unnecessary copying as much as possible, E.G the nvstreammux “width” and “height” are set as the same resolution as the input videos, the nvstreammux internal video scaling is avoided.
Hi Fiona,
Ok, I understand. NvDsPreprocessing only tell NvInfer what part of input to process but whole frame is tranfered, right? So at the output I would get 40 complete images.
In case using NvDsPreprocessing I need the metadata (Inferences) to continue my business logic and take decissions in my python code. Any other alternative?
Yes.
Do you mean the model inputs or the inference results?
I need in Python code the inference results for the 40 tiles
pyds.NvDsMetaType.NVDS_PREPROCESS_BATCH_META is not inference result. The inference results are attached in the batch meta.
Please refer to ObjectCounterMarker() in /opt/nvidia/deepstream/deepstream/service-maker/sources/apps/python/pipeline_api/deepstream_test3_app/deepstream_test3.py, the fraem meta and object meta are available in the probe function.
More details are available in Introduction to Pipeline APIs — DeepStream documentation
According to documentattion: Gst-nvinfer — DeepStream documentation
The NvDsInferTensorMeta object’s metadata type is set to NVDSINFER_TENSOR_OUTPUT_META. To get this metadata you must iterate over the NvDsUserMeta user metadata objects in the list referenced by frame_user_meta_list or obj_user_meta_list. However, for preprocessed tensor input mode, you must first find the GstNvDsPreProcessBatchMeta in batch_meta->batch_user_meta_list, then access the roi_vector which contains NvDsRoiMeta objects, and iterate over each ROI meta’s roi_user_meta_list to find NVDSINFER_TENSOR_OUTPUT_META. Each NvDsRoiMeta contains frame_meta and object_meta references to identify the source object, and look for NVDSINFER_TENSOR_OUTPUT_META within each ROI meta’s roi_user_meta_list.
Just realized that in 7.1 documentattion: Gst-nvinfer — DeepStream documentation
Does not says anithing about how to get inference result specifically when you use the NvDsPrerocess plugin.
Can you clarify? Depend on the version? or documentattion for 7.1 is incomplete? or?
DeepStream 7.1 does not support it.
We have implemented with DeepStream 8.0.
Ok, thanks for clarification.
Also, I want to confirma that your proposed pipeline arquitecture works pretty well.
Thanks.
Sorry, is the “full frame getting transferred tile times” true? I’m having major latency issues when using nvdspreprocess to tile the whole frame.