High GPU usage during simple model inference on Tensorrt (original) (raw)
Description
High GPU usage during simple model inference on Tensorrt.
Environment
TensorRT Version: 10.3.0.30
GPU Type: Tegra
Nvidia Driver Version: 540.4.0
CUDA Version: 12.6.68
CUDNN Version: 9.3.0.75
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.12
Baremetal or Container (if container which image + tag): ultralytics/ultralytics:latest-jetson-jetpack6
I am encountering an issue where the GPU usage spikes to 99% during inference with a lightweight image detection model. This behavior is unexpected given the simplicity of the model and the low computational requirements of the inference task.
Expected Behavior:
GPU usage should remain low (ideally <20–30%) during inference with such a lightweight model and small input data.
Actual Behavior:
GPU usage spikes to 99% and remains there until the process exits. There is no proportional increase in memory usage or batch size to justify this load.
Relevant Files
Y-T-G May 2, 2025, 5:45am 2
Why do you expect this? This is not true. Usage is high when the inference isn’t bottlenecked by something else like slow CPU or I/O. The point of TensorRT is to minimize those bottlenecks and achieve higher GPU utilization, and getting more done in less time.