Unexpected Delay When Setting Model Interval > 0 in Custom RetinaNet Pipeline (original) (raw)

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question
Hello everyone,

I’m working on a DeepStream pipeline using a custom RetinaNet (ResNet50 backbone) model. Below is a visual representation of the pipeline:

pipeline

Model and Performance

The model is deployed via TensorRT (FP16 mode), and here’s the inference performance summary:

=== Performance summary ===
Throughput: 19.9816 qps
Latency: min = 49.9731 ms, max = 50.7459 ms, mean = 50.045 ms, median = 50.0352 ms

The pipeline includes a camera source (nvarguscamerasrc) and a custom NvDsInferParseCustomDropperWire parser. Here’s the relevant configuration:

[property]
gpu-id=0
net-scale-factor=0.017352074
offsets=123.675;116.28;103.53
model-color-format=0 # 0=RGB, 1=BGR
onnx-file=models/dropper_wire/dropper_wire_model.onnx
model-engine-file=models/dropper_wire/dropper_wire_model.onnx_b1_gpu0_fp16.engine
labelfile-path=dropper_wire_labels.txt
network-input-order=0
batch-size=1
network-mode=2 # 0=FP32, 1=INT8, 2=FP16 mode
network-type=0 # 0 for detector
num-detected-classes=2
process-mode=1
gie-unique-id=4
interval=0 # 10 - skips every 10 batches
scaling-compute-hw=2
parse-bbox-func-name=NvDsInferParseCustomDropperWire
custom-lib-path=libnvds_dw_bboxparser.so
cluster-mode=2

[class-attrs-all]
pre-cluster-threshold=0.5
nms-iou-threshold=0.3
topk=50

Probe Function and Observation

To monitor source frame timing and whether there is some delay or not, I added a simple probe on the camera source that calculates time difference every 60 frames:

import time

old_pts = 0
frame_counter = 0

def buffer_probe_callback(pad, info):
    global old_pts
    global frame_counter

    frame_counter += 1
    if frame_counter % 60 == 0:
        new_pts = time.time_ns()
        print(f"new pts: {new_pts} | old pts: {old_pts} | diff: {new_pts - old_pts}")
        old_pts = new_pts

    return Gst.PadProbeReturn.OK

Problem: Delay with interval=10

When I set interval=10, which should reduce GPU load by running inference on every 10th frame, I noticed increased latency between timestamps on the source pad (even though the probe is on the camera source, not the inference output). Here are my logs:

new pts: 1745933478882083548 | old pts: 1745933477854277917 | diff: 1027805631
new pts: 1745933479881374875 | old pts: 1745933478882083548 | diff: 999291327
new pts: 1745933480919815184 | old pts: 1745933479881374875 | diff: 1038440309
new pts: 1745933481974549303 | old pts: 1745933480919815184 | diff: 1054734119
new pts: 1745933483012883547 | old pts: 1745933481974549303 | diff: 1038334244
new pts: 1745933484048887011 | old pts: 1745933483012883547 | diff: 1036003464
new pts: 1745933485061569819 | old pts: 1745933484048887011 | diff: 1012682808
new pts: 1745933486080445473 | old pts: 1745933485061569819 | diff: 1018875654
new pts: 1745933487112439325 | old pts: 1745933486080445473 | diff: 1031993852
new pts: 1745933488129228906 | old pts: 1745933487112439325 | diff: 1016789581
new pts: 1745933489178694631 | old pts: 1745933488129228906 | diff: 1049465725
new pts: 1745933490195145097 | old pts: 1745933489178694631 | diff: 1016450466
new pts: 1745933491215312214 | old pts: 1745933490195145097 | diff: 1020167117

However, with interval=0, the timestamps are as expected — around 1 second apart:


new pts: 1745933583804167777 | old pts: 1745933582804185553 | diff: 999982224
new pts: 1745933584807426806 | old pts: 1745933583804167777 | diff: 1003259029
new pts: 1745933585804810812 | old pts: 1745933584807426806 | diff: 997384006
new pts: 1745933586806014311 | old pts: 1745933585804810812 | diff: 1001203499
new pts: 1745933587806854886 | old pts: 1745933586806014311 | diff: 1000840575
new pts: 1745933588805590885 | old pts: 1745933587806854886 | diff: 998735999
new pts: 1745933589806389363 | old pts: 1745933588805590885 | diff: 1000798478
new pts: 1745933590808212773 | old pts: 1745933589806389363 | diff: 1001823410
new pts: 1745933591809462850 | old pts: 1745933590808212773 | diff: 1001250077
new pts: 1745933592807678617 | old pts: 1745933591809462850 | diff: 998215767
new pts: 1745933593808176268 | old pts: 1745933592807678617 | diff: 1000497651
new pts: 1745933594809096377 | old pts: 1745933593808176268 | diff: 1000920109
new pts: 1745933595808256813 | old pts: 1745933594809096377 | diff: 999160436
new pts: 1745933596810548820 | old pts: 1745933595808256813 | diff: 1002292007
new pts: 1745933597812056947 | old pts: 1745933596810548820 | diff: 1001508127
new pts: 1745933598809397080 | old pts: 1745933597812056947 | diff: 997340133

This is very strange as I would expect difference to raise because model needs more time to perform inference on each frame rarger than on every 10th framne,

Question

Why does increasing the interval in the primary inference element cause larger delays in the upstream camera source timestamps?

Could this be due to using interpipesink and interpipesrc elements in the pipeline?

I want to use interval=10 to reduce GPU and power consumption, but I cannot afford the added delay in source frames. Is there a way to set the interval without stalling or delaying frame generation at the source?

Any insights or suggestions on how to handle inference interval without affecting the upstream timing would be greatly appreciated!