Alerts Reference (original) (raw)

Objective

This document provides reference information on various types of alerts supported by F5® Distributed Cloud Services. Use the information provided in this document to understand the details on the various alerts and actions required to be performed.

In Distributed Cloud Console, the Alerts page displays two tabs: Active Alerts and All Alerts.

Active Alerts

An alert is generated when the alert condition is evaluated to true. Alert rules are evaluated periodically, and the alert status remains active as long as the alert condition is active.

Note: There are two alert APIs. The Get Alerts API returns active alerts, and the state of alerts will be active. The Alerts History API returns a history of alert notifications for a selected time interval. The status can be firing (which is the same as active) or resolved.

The following are some keys and their corresponding values for an active alert (can be viewed from the JSON view of an alert in Console):

state: The value is active.
startsAt: The time at which alert started firing.
endsAt: The time at which the alert got resolved (if it is resolved). Ignore this field if the alert status is active or firing.
generatorURL: Identifies the entity that generated this alert. This is an internal URL and hence it is always set to “”.
silencedBy and inhibitedBy: This is always null and should be ignored.
receivers: List of alert receivers this alert notification was sent to, based on the user configured alert policy. This is empty if no alert policy configured or this alert did not match any configured alert policy.
fingerprint: This is a hash of the key-value pairs in the alert, and it uniquely identifies an alert.

All Alerts

The All Alerts tab shows the history of alerts triggered for the selected date and time interval. The following are some keys and their corresponding values (can be viewed from the JSON view of an alert in Console):

status: An alert can have one of the following values:
- firing: This is the same as active state.
- resolved: This indicates that the alert is resolved.
startsAt: The time at which alert started firing.
endsAt: The time at which the alert got resolved (if it is resolved). Ignore this field if the alert status is active or firing.

Key Points

An alert is resolved in the following cases:

Alert condition is no longer active.
If the alert has valid endsAt time, and it is lapsed.
If the alert has no valid endsAt time, and no updates are received for the resolve_timeout duration (15 minutes).

Note: In case of an active alert, you can ignore the endsAt time. The entity generating the alert may set this endsAt time and the alert manager resolves the alert after this time is lapsed.

There is no separate alert for health score. This is because health score is composed of multiple components. For example, health score of a site is computed based on the data-plane connection status to the Regional Edge (RE) sites, control plane connection status, and K8s API server status in the Site. There are individual alerts defined for each of the above conditions, but no alert is available for the health score itself.

Note: You can obtain the health score of a Site in F5® Distributed Cloud Console (Console). You can also obtain it using the API https://www.volterra.io/docs/api/graph-connectivity#operation/ves.io.schema.graph.connectivity.CustomAPI.NodeQuery with "field_selector":{"healthscore":{"types":["HEALTHSCORE_OVERALL"]}}.

The amount of time before alert generation is not the same for all alerts. This duration is determined based on the severity of the alerts. For example, an alert is raised as soon as the tunnel connection to RE goes down, whereas a health check alert for a service is raised only if the condition persists for 10 minutes. This is to keep the alert volume under manageable levels and not to generate alerts on temporary or transient failure conditions.

You cannot change the threshold for alerts.

You cannot define new alerts using an API. However, in case the existing alerts do not satisfy your requirements, you can create a support request for a new alert in Console.

Alerts and Descriptions

The following table presents alerts and associated details, such as group, type, severity, and associated actions.

TSA Severity vs Anomaly Scores

Time-Series Anomaly (TSA) alerts are generated when the anomaly detection algorithm determines anomalies in any one of the following metrics:

Request rate
Error Rate
Response Throughput
Request Throughput
Response Latency

Note: The metrics are evaluated in requests per second (rps), errors per second (erps), seconds (s), and Megabits per second (Mbps).

The alerts are classified into three groups (minor, major, and critical) and based on severity. The minimum/absolute thresholds for the metrics to trigger these alerts are provided in the following table:

Metric	Severity	Score	Absolute Threshold	Alert
Request Rate	Minor	0.6	5 rps	RequestRateAnomaly
Request Rate	Major	1.5	50 rps	RequestRateAnomaly
Request Rate	Critical	3.0	100 rps	RequestRateAnomaly
Request Throughput	Minor	0.6	0.25 Mbps	RequestThroughputAnomaly
Request Throughput	Major	1.5	2.5 Mbps	RequestThroughputAnomaly
Request Throughput	Critical	3.0	5 Mbps	RequestThroughputAnomaly
Response Throughput	Minor	0.6	2.5 Mbps	ResponseThroughputAnomaly
Response Throughput	Major	1.5	25 Mbps	ResponseThroughputAnomaly
Response Throughput	Critical	3.0	50 Mbps	ResponseThroughputAnomaly
Response Latency	Minor	0.6	5 s	ResponseLatencyAnomaly
Response Latency	Major	1.5	50 s	ResponseLatencyAnomaly
Response Latency	Critical	3.0	100 s	ResponseLatencyAnomaly
Error Rate	Minor	0.6	2.5 erps	ErrorRateAnomaly
Error Rate	Major	1.5	25 erps	ErrorRateAnomaly
Error Rate	Critical	3.0	50 erps	ErrorRateAnomaly

Note: For more information on the TSA, see Configure DDoS Detection guide.

In case of an L7 DDoS event, the minimum thresholds are similar to the absolute thresholds defined for critical TSA alerts. That is, for an L7 DDoS event, the following are the minimum thresholds defined for metrics and are not configurable by end users:

Request Rate: 100 rps
Error Rate: 50 erps
Request Throughput: 5 Mbps
Response Throughout: 50 Mbps
Response Latency: 100 s