Monitoring - AWS Deep Learning AMIs (original) (raw)
Monitoring - AWS Deep Learning AMIs
Your DLAMI comes preinstalled with several GPU monitoring tools. This guide also mentions tools that are available to download and install.
- Monitor GPUs with CloudWatch - a preinstalled utility that reports GPU usage statistics to Amazon CloudWatch.
- nvidia-smi CLI - a utility to monitor overall GPU compute and memory utilization. This is preinstalled on your AWS Deep Learning AMIs (DLAMI).
- NVML C library - a C-based API to directly access GPU monitoring and management functions. This used by the nvidia-smi CLI under the hood and is preinstalled on your DLAMI. It also has Python and Perl bindings to facilitate development in those languages. The gpumon.py utility preinstalled on your DLAMI uses the pynvml package from nvidia-ml-py.
- NVIDIA DCGM - A cluster management tool. Visit the developer page to learn how to install and configure this tool.
GPU Monitoring and Optimization
CloudWatch
Did this page help you? - Yes
Thanks for letting us know we're doing a good job!
If you've got a moment, please tell us what we did right so we can do more of it.
Did this page help you? - No
Thanks for letting us know this page needs work. We're sorry we let you down.
If you've got a moment, please tell us how we can make the documentation better.