GPU Usage Stuck at Placeholder in C++ Llama 3.2 App - Need NVML Help! (original) (raw)

Description

Hey NVIDIA crew, I’m working on this C++ terminal app for Llama 3.2 (shoutout to your GPU tech in the README!), and I’ve hit a snag. The GPU usage is hardcoded to int gpu_usage = 0;—no real measurement, just a placeholder. I’m on the “optimize-algorithm” branch trying to juice up performance, but without actual GPU stats, I’m stuck. The README teases “GPU usage: 5%” in an example, but it’s fake. How do I hook up something like NVML to get the real deal? Appreciate any pointers!

Environment

Here’s my setup (fill in your own if different):

TensorRT Version: N/A (not using it here)
GPU Type: NVIDIA GTX 1660 (mid-tier, might upgrade—yours?)
Nvidia Driver Version: 535.104.05
CUDA Version: 11.8
CUDNN Version: 8.9.0
Operating System + Version: Ubuntu 22.04
Python Version: N/A (pure C++)
TensorFlow Version: N/A
PyTorch Version: N/A
Baremetal or Container: Baremetal

Relevant Files

Check my repo:

main.cpp—where the GPU usage sits at zero.
README.md—admits it’s a placeholder and gives NVIDIA props.
Link: https://github.com/bniladridas/cpp_terminal_app/tree/optimize-algorithm

Steps To Reproduce

Clone it: git clone -b optimize-algorithm https://github.com/bniladridas/cpp_terminal_app.git
Build: mkdir build && cd build && cmake .. && make
Run: ./LlamaTerminalApp
Output shows “GPU usage: 0%” (or 5% in README example)—all fake, no crash, just no real data.

No traceback, just a quiet fail on the GPU front.

Question

I’m thinking NVML could fix this since you guys rock CUDA. How do I plug it in to measure actual GPU usage for this Llama 3.2 beast? Code snippets or tips would be clutch—thanks!

Hi @bniladridas
Apologies for the delay,
This forum talks about issues related to TRT, hence i suggest you to pls raise it top CUDA forum.

thanks