Alerts Reference (original) (raw)

Objective

This document provides reference information on various types of alerts supported by F5® Distributed Cloud Services. Use the information provided in this document to understand the details on the various alerts and actions required to be performed.

In Distributed Cloud Console, the Alerts page displays two tabs: Active Alerts and All Alerts.

Active Alerts

An alert is generated when the alert condition is evaluated to true. Alert rules are evaluated periodically, and the alert status remains active as long as the alert condition is active.

Note: There are two alert APIs. The Get Alerts API returns active alerts, and the state of alerts will be active. The Alerts History API returns a history of alert notifications for a selected time interval. The status can be firing (which is the same as active) or resolved.

The following are some keys and their corresponding values for an active alert (can be viewed from the JSON view of an alert in Console):


All Alerts

The All Alerts tab shows the history of alerts triggered for the selected date and time interval. The following are some keys and their corresponding values (can be viewed from the JSON view of an alert in Console):


Key Points

An alert is resolved in the following cases:

Note: In case of an active alert, you can ignore the endsAt time. The entity generating the alert may set this endsAt time and the alert manager resolves the alert after this time is lapsed.

There is no separate alert for health score. This is because health score is composed of multiple components. For example, health score of a site is computed based on the data-plane connection status to the Regional Edge (RE) sites, control plane connection status, and K8s API server status in the Site. There are individual alerts defined for each of the above conditions, but no alert is available for the health score itself.

Note: You can obtain the health score of a Site in F5® Distributed Cloud Console (Console). You can also obtain it using the API https://www.volterra.io/docs/api/graph-connectivity#operation/ves.io.schema.graph.connectivity.CustomAPI.NodeQuery with "field_selector":{"healthscore":{"types":["HEALTHSCORE_OVERALL"]}}.

The amount of time before alert generation is not the same for all alerts. This duration is determined based on the severity of the alerts. For example, an alert is raised as soon as the tunnel connection to RE goes down, whereas a health check alert for a service is raised only if the condition persists for 10 minutes. This is to keep the alert volume under manageable levels and not to generate alerts on temporary or transient failure conditions.

You cannot change the threshold for alerts.

You cannot define new alerts using an API. However, in case the existing alerts do not satisfy your requirements, you can create a support request for a new alert in Console.


Alerts and Descriptions

The following table presents alerts and associated details, such as group, type, severity, and associated actions.

TSA Severity vs Anomaly Scores

Time-Series Anomaly (TSA) alerts are generated when the anomaly detection algorithm determines anomalies in any one of the following metrics:

Note: The metrics are evaluated in requests per second (rps), errors per second (erps), seconds (s), and Megabits per second (Mbps).

The alerts are classified into three groups (minor, major, and critical) and based on severity. The minimum/absolute thresholds for the metrics to trigger these alerts are provided in the following table:

Metric Severity Score Absolute Threshold Alert
Request Rate Minor 0.6 5 rps RequestRateAnomaly
Request Rate Major 1.5 50 rps RequestRateAnomaly
Request Rate Critical 3.0 100 rps RequestRateAnomaly
Request Throughput Minor 0.6 0.25 Mbps RequestThroughputAnomaly
Request Throughput Major 1.5 2.5 Mbps RequestThroughputAnomaly
Request Throughput Critical 3.0 5 Mbps RequestThroughputAnomaly
Response Throughput Minor 0.6 2.5 Mbps ResponseThroughputAnomaly
Response Throughput Major 1.5 25 Mbps ResponseThroughputAnomaly
Response Throughput Critical 3.0 50 Mbps ResponseThroughputAnomaly
Response Latency Minor 0.6 5 s ResponseLatencyAnomaly
Response Latency Major 1.5 50 s ResponseLatencyAnomaly
Response Latency Critical 3.0 100 s ResponseLatencyAnomaly
Error Rate Minor 0.6 2.5 erps ErrorRateAnomaly
Error Rate Major 1.5 25 erps ErrorRateAnomaly
Error Rate Critical 3.0 50 erps ErrorRateAnomaly

Note: For more information on the TSA, see Configure DDoS Detection guide.

In case of an L7 DDoS event, the minimum thresholds are similar to the absolute thresholds defined for critical TSA alerts. That is, for an L7 DDoS event, the following are the minimum thresholds defined for metrics and are not configurable by end users: