High GPU usage during simple model inference on Tensorrt (original) (raw)

Description

High GPU usage during simple model inference on Tensorrt.

Environment

TensorRT Version: 10.3.0.30
GPU Type: Tegra
Nvidia Driver Version: 540.4.0
CUDA Version: 12.6.68
CUDNN Version: 9.3.0.75
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.12
Baremetal or Container (if container which image + tag): ultralytics/ultralytics:latest-jetson-jetpack6

I am encountering an issue where the GPU usage spikes to 99% during inference with a lightweight image detection model. This behavior is unexpected given the simplicity of the model and the low computational requirements of the inference task.

Expected Behavior:
GPU usage should remain low (ideally <20–30%) during inference with such a lightweight model and small input data.

Actual Behavior:
GPU usage spikes to 99% and remains there until the process exits. There is no proportional increase in memory usage or batch size to justify this load.

Relevant Files

GPU Metric 99%

Y-T-G May 2, 2025, 5:45am 2

Why do you expect this? This is not true. Usage is high when the inference isn’t bottlenecked by something else like slow CPU or I/O. The point of TensorRT is to minimize those bottlenecks and achieve higher GPU utilization, and getting more done in less time.