Deploy Neuron Monitor Daemonset — AWS Neuron Documentation (original) (raw)
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
Neuron monitor is primary observability tool for neuron devices. For details of neuron monitor, please refer to the neuron monitor guide. This tutorial describes deploying neuron monitor as a daemonset on the kubernetes cluster.
Deploy Neuron Monitor Daemonset#
- Download the neuron monitor yaml file. k8s-neuron-monitor-daemonset.yml
- Apply the Neuron monitor yaml to create a daemonset on the cluster with the following command
kubectl apply -f k8s-neuron-monitor.yml
- Verify that neuron monitor daemonset is running
kubectl get ds neuron-monitor --namespace neuron-monitor
Expected result (with 2 nodes in cluster):
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
neuron-monitor 2 2 2 2 2 27h - Get the neuron-monitor pod names
Expected result
NAME READY STATUS RESTARTS AGE
neuron-monitor-slsxf 1/1 Running 0 17m
neuron-monitor-wc4f5 1/1 Running 0 17m - Verify the prometheus endpoint is available
kubectl exec neuron-monitor-wc4f5 -- wget -q --output-document - http://127.0.0.1:8000
Expected result
HELP python_gc_objects_collected_total Objects collected during gc
TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 362.0
python_gc_objects_collected_total{generation="1"} 0.0
python_gc_objects_collected_total{generation="2"} 0.0HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
TYPE python_gc_objects_uncollectable_total counter
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2