GPU Usage Stuck at Placeholder in C++ Llama 3.2 App - Need NVML Help! (original) (raw)

Description

Hey NVIDIA crew, I’m working on this C++ terminal app for Llama 3.2 (shoutout to your GPU tech in the README!), and I’ve hit a snag. The GPU usage is hardcoded to int gpu_usage = 0;—no real measurement, just a placeholder. I’m on the “optimize-algorithm” branch trying to juice up performance, but without actual GPU stats, I’m stuck. The README teases “GPU usage: 5%” in an example, but it’s fake. How do I hook up something like NVML to get the real deal? Appreciate any pointers!

Environment

Here’s my setup (fill in your own if different):

Relevant Files

Check my repo:

Steps To Reproduce

  1. Clone it: git clone -b optimize-algorithm https://github.com/bniladridas/cpp_terminal_app.git
  2. Build: mkdir build && cd build && cmake .. && make
  3. Run: ./LlamaTerminalApp
  4. Output shows “GPU usage: 0%” (or 5% in README example)—all fake, no crash, just no real data.

No traceback, just a quiet fail on the GPU front.

Question

I’m thinking NVML could fix this since you guys rock CUDA. How do I plug it in to measure actual GPU usage for this Llama 3.2 beast? Code snippets or tips would be clutch—thanks!

Hi @bniladridas
Apologies for the delay,
This forum talks about issues related to TRT, hence i suggest you to pls raise it top CUDA forum.

thanks