High GPU usage during simple model inference on Tensorrt (original) (raw)

Description

High GPU usage during simple model inference on Tensorrt.

Environment

TensorRT Version: 10.3.0.30
GPU Type: Tegra
Nvidia Driver Version: 540.4.0
CUDA Version: 12.6.68
CUDNN Version: 9.3.0.75
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.12
Baremetal or Container (if container which image + tag): ultralytics/ultralytics:latest-jetson-jetpack6

I am encountering an issue where the GPU usage spikes to 99% during inference with a lightweight image detection model. This behavior is unexpected given the simplicity of the model and the low computational requirements of the inference task.

Expected Behavior:
GPU usage should remain low (ideally <20–30%) during inference with such a lightweight model and small input data.

Actual Behavior:
GPU usage spikes to 99% and remains there until the process exits. There is no proportional increase in memory usage or batch size to justify this load.

Relevant Files

Y-T-G May 2, 2025, 5:45am 2

Why do you expect this? This is not true. Usage is high when the inference isn’t bottlenecked by something else like slow CPU or I/O. The point of TensorRT is to minimize those bottlenecks and achieve higher GPU utilization, and getting more done in less time.

Could you please let me know what metrics to refer GPU accelerator duty cycle on Jetson?