Alerting overview (original) (raw)

This document describes how you can get notified when your application fails or when the performance of an application doesn't meet defined criteria.

How alerting works

The Cloud Monitoring alerting process contains three parts:

Alerting policies can evaluate three types of data:

The alerting process helps you respond to issues when the performance of an application fails to meet acceptable values. For example, you deploy a web application onto a Compute Engine virtual machine (VM) instance. While you expect the HTTP response latency to fluctuate, you want your support team to respond when the application has high latency for a significant time period. You could create a metric-based alerting policy that monitors the application's HTTP response latency metric. If the response latency is higher than two seconds for at least five minutes, then Monitoring creates an incident and sends email notifications to your support team.

How to create an alerting policy

There are multiple ways to create an alerting policy. For example, you can use pre-configured alerting policies by enabling recommended alerts from integrations or certain pages in the Google Cloud console. You can also configure a new alerting policy by using the Google Cloud console, the Cloud Monitoring API, the Google Cloud CLIand Terraform.

Monitoring provides pre-built packages to let you create alerting policies for your Google Cloud services and third-party integrations. The packages include recommended alerting policies, sample dashboards, and key metrics for the service. These packages are available for Google Cloud services such as Google Kubernetes Engine, Compute Engine, and Cloud SQL, and common third-party integrations such as MongoDB, Kafka, and Elasticsearch.

When you install a package, you can enable the package's recommended alerting policies. When you enable a recommended alerting policy, you configure its notification channel and optionally modify other values. After configuration, the alerting policy begins monitoring its target immediately, with no further user input required.

Recommended alerting policies are helpful when you've deployed a new service and want to alert on important metrics. For example, the Cloud SQL integration package comes with recommended alerting policies for failed instances and slow transactions:

Two of the recommended alerting policies for the Cloud SQL integration package.

To learn more, see the following documents:

Create new alerting policies

You can create alerting policies to monitor different types of data depending on your alerting needs. The following sections list the different types of data that you can monitor with alerting policies.

Monitor time series data

Condition Type Description Example
Metric-threshold condition Metric-threshold conditions are met when the values of a metric are more than, or less than, a threshold for a specific retest window. For more information, seeCreate metric-threshold alerting policies andCreate alerting policies by using the API. You want an alerting policy that sends a notification when response latency is 500ms or higher for five consecutive uptime checks over 10 minutes.
Metric-absence condition Metric-absence conditions are met when a monitored time series has no data for a specific retest window. The maximum retest window is 23.5 hours. For more information, seeCreate metric-absence alerting policies andCreate alerting policies by using the API. You want an alerting policy that opens an incident with your support team when a resource doesn't respond to any HTTP requests over the course of five minutes.
Forecasted metric-value condition Forecasted metric-value conditions are met when the alerting policy predicts that the threshold will be violated within the upcoming forecast window. The forecast window can range from 1 hour to 7 days. For more information, seeCreate forecasted metric-value alerting policies andCreate alerting policies by using the API. You want an alerting policy that opens an incident with your support team when a resource is likely to reach 80% disk space usage within the next 24 hours.

Monitor log entry data

To monitor individual log entries, use a log-based alerting policy. A condition on a log-based alerting policy is met when the alerting policy detects that a phrase from a log entry match the alerting policy criteria. For example, you want an alerting policy that opens an incident with your support team when a log entry's messagecontains product_ids=['tier_1_support', 'tier_2_support'].

For more information, see Configure log-based alerting policies in the Logging documentation.

Monitor SQL query results

To monitor SQL query results, use a SQL-based alerting policy. The condition of a SQL-based alerting policy periodically analyzes your log entry data and then create incidents when the table of query results meets certain criteria. This type of alerting policy is helpful when you need an alerting policy that monitors aggregations of data or complex patterns across multiple log entries. For example, you want to get notified when more than 50 log entries in the last 60 minutes have a severity of WARNING.

For more information, seeMonitor your SQL query results with an alerting policy in the Logging documentation.

Alerting policy components

Each alerting policy has the following components:

Query languages

Use Prometheus Query Language (PromQL) and filters in your alerting policies to take greater control over your metric evaluation. Monitoring supports the following query types:

Manage alerting policies and incidents

After an alerting policy is enabled, Monitoring continuously monitors the conditions of that policy. You can't configure the alerting policy to monitor conditions only for certain time periods. If you want to disable the alerting policy for a certain time period, then create asnooze.

If an incident is open and Monitoring determines that the conditions of the metric-based policy are no longer met, then Monitoring automatically closes the incident and sends a notification about the closure.

Pricing

To learn about pricing for Cloud Monitoring, see the Google Cloud Observability pricing page.

For information about how to monitor the number of trace spans or logs that are ingested, or how to be notified when specific content is included in a log entry, see the following documents:

What's next