Prometheus Metrics (original) (raw)

  1. Home
  2. F5 NGINX Service Mesh
  3. Guides Prometheus Metrics

F5 NGINX Service Mesh integrates with Prometheus for metrics and Grafana for visualizations.

Note: To configure NGINX Service Mesh to use Prometheus when deploying, refer to the Monitoring and Tracing guide for instructions.

To configure NGINX Service Mesh to use Prometheus when deploying, refer to the Monitoring and Tracing guide for instructions.

The mesh supports the SMI spec, including traffic metrics. The NGINX Service Mesh creates an extension API Server and shim that query Prometheus and return the results in a traffic metrics format. See SMI Traffic Metrics for more information.

Note: Occasionally metrics are reset when the nginx-mesh-sidecar reloads NGINX Plus. If traffic is flowing and you fail to see metrics, retry after 30 seconds.

Occasionally metrics are reset when the nginx-mesh-sidecar reloads NGINX Plus. If traffic is flowing and you fail to see metrics, retry after 30 seconds.

If you are deploying NGINX Plus Ingress Controller with the NGINX Service Mesh, make sure to configure the NGINX Plus Ingress Controller to export metrics. Refer to the Metrics section of the NGINX Plus Ingress Controller Deployment tutorial for instructions.

The NGINX Service Mesh sidecar exposes the following metrics in Prometheus format via the /metrics path on port 8887:

All metrics have the namespace nginxplus, for example nginxplus_http_requests_total and nginxplus_upstream_server_response_latency_ms_count.

This section includes a set of example metrics that you may plug into your existing Prometheus-based tooling to gain insights into the traffic flowing through your applications.

irate(nginxplus_http_requests_total[30s])  
irate(nginxplus_http_requests_total[30s])  
nginxplus_upstream_server_responses{code=~"3xx|4xx|5xx"}  
nginxplus_upstream_server_responses{code=~"3xx|4xx|5xx"}  

This can be used to form more complex queries such as current success rate:

sum(irate(nginxplus_upstream_server_responses{code=~"1xx|2xx"}[30s])) by (app, version) / sum(irate(nginxplus_upstream_server_responses[30s])) by (app, version)  
sum(irate(nginxplus_upstream_server_responses{code=~"1xx|2xx"}[30s])) by (app, version) / sum(irate(nginxplus_upstream_server_responses[30s])) by (app, version)  
irate(nginxplus_stream_upstream_server_sent[30s])  
irate(nginxplus_stream_upstream_server_sent[30s])  
nginxplus_stream_upstream_server_connections  
nginxplus_stream_upstream_server_connections  
nginxplus_stream_upstream_server_connect_time  
nginxplus_stream_upstream_server_connect_time  
nginxplus_stream_upstream_server_first_byte_time  
nginxplus_stream_upstream_server_first_byte_time  
nginxplus_stream_upstream_server_response_time  
nginxplus_stream_upstream_server_response_time  

All metrics have the following labels:

Metric Name Description
job Prometheus job name. All metrics scraped from an nginx-mesh-sidecar have a job name of nginx-mesh-sidecars, and all metrics scraped from an NGINX Plus Ingress Controller have a job name of nginx-plus-ingress.
pod Name of the Pod.
namespace Namespace where the Pod resides.
instance Address of the Pod.
pod_template_hash Value of the pod-template-hash Kubernetes label.
deployment, statefulset, or daemonset Name of the Deployment, StatefulSet, or DaemonSet that the Pod belongs to.

Metrics for upstream servers, such as nginxplus_upstream_server_requests, have these additional labels:

Metric Name Description
code Response code of the upstream server. For NGINX Plus metrics, the code will be one of the following: 1xx, 2xx, 3xx, 4xx, or 5xx. For the upstream_server_response_latency_ms metrics, the code is the specific response code, such as 201.
upstream Name of the upstream server group.
server Address of the upstream server selected by NGINX.

Metrics for outgoing requests have the following destination labels:

Metric Name Description
dst_pod Name of the Pod that the request was sent to.
dst_service Name of the Service that the request was sent to.
dst_deployment, dst_statefulset, or dst_daemonset Name of the Deployment, StatefulSet, or DaemonSet that the request was sent to.
dst_namespace Namespace that the request was sent to.

Metrics exported by NGINX Plus Ingress Controller have these additional labels:

Metric Name Description
ingress Set to true if ingress traffic is enabled.
egress Set to true if egress traffic is enabled.
class Ingress class of the NGINX Plus Ingress Controller.
resource_type Type of resource: VirtualServer, VirtualServerRoute, or Ingress.
resource_name Name of the VirtualServer, VirtualServerRoute, or Ingress resource.
resource_namespace Namespace of the resource. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_namespace for queries and filters.
service Service the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_service for queries and filters.
pod_name Name of the Pod that the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_pod for queries and filters.

Filter Prometheus Metrics using Labels

Here are some examples of how you can use the labels above to filter your Prometheus metrics:

nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code="5xx"}  
nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code="5xx"}  
nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code=~"1xx|2xx"}  
nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code=~"1xx|2xx"}  
histogram_quantile(0.99, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details"}[30s])) by (le))  
histogram_quantile(0.99, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details"}[30s])) by (le))  
histogram_quantile(0.90, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code!="301"}[30s])) by (le))  
histogram_quantile(0.90, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code!="301"}[30s])) by (le))  
histogram_quantile(0.50, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code=~"200|201"}[30s])) by (le))  
histogram_quantile(0.50, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code=~"200|201"}[30s])) by (le))  
nginxplus_connections_active{job="nginx-plus-ingress"}  
nginxplus_connections_active{job="nginx-plus-ingress"}  

The custom NGINX Service Mesh Grafana dashboard NGINX Mesh Top can be imported into your Grafana instance. For instructions and a list of features, see the Grafana example in the nginx-service-mesh GitHub repo.

To view Grafana, port-forward your Grafana Service:

kubectl port-forward -n <grafana-namespace> svc/grafana 3000
kubectl port-forward -n <grafana-namespace> svc/grafana 3000