Advanced API Performance: SetStablePowerState (original) (raw)
This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.
Most modern processors, including GPUs, change processor core and memory clock rates during application execution. These changes can vary performance, introducing errors in measurements and rendering comparisons between runs difficult.
Recommended
- Use the
nvidia-smi
utility to set the GPU core and memory clocks before attempting measurements. This command is installed by typical driver installations on Windows and Linux. Installation locations may vary by OS version but should be fairly stable.- Run commands on an administrator console on Windows, or prepend
sudo
to the following commands on Linux-like OSs. - To query supported clock rates
*nvidia-smi --query-supported-clocks=timestamp,gpu_name,gpu_uuid,memory,graphics --format=csv
- To set the core and memory clock rates, respectively:
*nvidia-smi --lock-gpu-clocks=<core_clock_rate>
*nvidia-smi --lock-memory-clocks=<memory_clock_rate>
- Perform performance capture or other work.
- To reset the core and memory clock rates, respectively:
*nvidia-smi --reset-gpu-clocks
*nvidia-smi --reset-memory-clocks
- For general use during a project, it may be convenient to write a simple script to lock the clocks, launch your application, and after exit, reset the clocks.
- For command-line help, run
nvidia-smi --help
. There are shortened versions of the commands listed earlier for your convenience. - For more information, see NVIDIA System Management Interface.
- Run commands on an administrator console on Windows, or prepend
- Use the DX12 function
SetStablePowerState
to read the GPU’s predetermined stable power clock rate. The stable GPU clock rate may vary by board.- Modify a DX12 sample to invoke
SetStablePowerState
. - Execute
nvidia-smi -q -d CLOCK
, and record the Graphics clock frequency with theSetStablePowerState
sample running. Use this frequency with the--lock-gpu-clocks
option.
- Modify a DX12 sample to invoke
- Use Nsight Graphics’s GPU Trace activity with the option to lock core and memory clock rates during profiling (Figure 1).
Figure 1. Lock Clocks to Base checkbox
Not recommended
- Don’t rely solely on the
SetStablePowerState
function when profiling.SetStablePowerState
does not lock the memory clock, which makes the results less comparable than when the appropriate clocks are locked withnvidia-smi
.
Related resources
- DLI course: Speed Up DataFrame Operations With RAPIDS cuDF
- GTC session: Powering the AI Revolution – Innovating and Accelerating Critical Physical Infrastructure for AI Factories (Presented by Vertiv)
- GTC session: Nsight Analysis System: Build Custom Python Analysis Scripts to Summarize Performance and Reveal Bottlenecks With Single- and Multi-Node Applications
- GTC session: Accelerate and Scale Your AI Deployment Through Automated Infrastructure Selection and Management
- SDK: Nsight Perf SDK
- Webinar: Want to drive innovation and speed up scientific workloads? It starts with the network.