Monitoring (original) (raw)

High availability, backups, and disaster recovery systems help when something goes wrong with your PostgreSQL cluster. Monitoring helps you anticipate problems before they happen. Additionally, monitoring can help you diagnose and resolve issues that degrade performance.

There are many different ways to monitor systems within Kubernetes, including tools that come with Kubernetes itself. Here we review what Crunchy Postgres for Kubernetes provides for an out-of-the-box monitoring solution.

Getting Started

If you want to install the metrics stack, please visit the installation instructions for the PostgreSQL Operator Monitoring stack.

Components

The PostgreSQL Operator Monitoring stack is made up of several open source components:

In versions before CPK v5.8.0, this stack included postgres_exporter. postgres_exporter both provided queries used to collect metrics information about a PostgreSQL instance, as well as serving as the mechanism to run queries defined by pgMonitor or through custom queries.

Starting from CPK v5.8.0, CPK now offers a choice of mechanisms for querying and exporting metrics from Postgres instances. While postgres_exporter is still an option, users can enable the OpenTelemetryMetrics feature gate for individual clusters. If you are using OpenTelemetry Metrics, then instead of postgres_exporter providing metric queries, CPK is managing those queries directly and using an OpenTelemetry SQL query library to expose those metrics. For more on information on the OpenTelemetry Architecture, see our Database Obversability page and our guide to OpenTelemetry metrics.

PGO Metrics

Starting in CPK v5.8.0, the metrics endpoint provided by controller-runtime is exposed on the postgres-operator Pod and is secure by default, using https to encrypt traffic and Kubernetes authentication and authorization to ensure only service accounts with proper RBAC permissions can scrape the endpoint. The provided metrics can give you insight into the behavior and performance of the different controllers in the postgres-operator.

Warning

The certificates used for https are self-signed certificates generated by controller-runtime. If you wish to provide your own certificates, see the section below.

Installing Custom Certificates for the PGO Metrics endpoint

To provide your custom certificates, they will need to be placed in a Secret, and the Secret will need to be created in the same Namespace as the postgres-operator. It should contain the TLS key (tls.key) and TLS certificate (tls.crt) needed to enable encryption, and they should be named accordingly in the Secret:

With the Secret in place, you need to adjust your postgres-operator Deployment so that you can mount the certificates from the Secret into a Volume for the operator's metrics server to use. This entails adding a Volume and a VolumeMount as seen in the example below:

After you configure the certificates for the controller-runtime metrics endpoint, you will need to update your Prometheus deployment to use these certificates, and your connection to the exporter will be encrypted. Check out the Prometheus documentation for more information on configuring TLS for Prometheus.

If the certificates are properly signed and the Prometheus configuration correct, you should be able to turn off the insecure_skip_verify setting in the Prometheus configuration, which can be found in the prometheus/config/prometheus.yml file in the CPK Monitoring installer.

The CPK Monitoring installer and CPK Operator metrics

The most recent CPK Monitoring installer includes changes to automatically scrape CPK operator metrics. If you are not seeing these metrics in the Prometheus or Grafana setup by the CPK Monitoring installer, you could double check that the Prometheus configuration includes a pgo-metricsscrape job. That job is configured to discover CPK Operator pods and scrape the metrics endpoint.

If you are missing that job, you may need to download a newer version of the CPK Monitoring installer and install that. This advice goes as well for users who were using CPK less than v5.8.0 and altered their monitoring stack to handle that case.

pgnodemx and the DownwardAPI

pgnodemx is able to pull and format container-specific metrics by accessing several Kubernetes fields that are mounted from the pod to the database container's filesystem. By default, these fields include the pod's labels and annotations, as well as the database pod's CPU and memory. These fields are mounted at the /etc/database-containerinfo path.

Visualizations

Below is a brief description of all the visualizations provided by the PostgreSQL Operator Monitoring stack. Some of the descriptions may include some directional guidance on how to interpret the charts, though this is only to provide a starting point: actual causes and effects of issues can vary between systems.

Many of the visualizations can be broken down based on the following groupings:

Overview

The overview provides an overview of all of the PostgreSQL clusters that are being monitoring by the PostgreSQL Operator Monitoring stack. This includes the following information:

Each entry is clickable to provide additional cluster details.

PostgreSQL Details

The PostgreSQL Details view provides more information about a specific PostgreSQL cluster that is being managed and monitored by the PostgreSQL Operator. These include many key PostgreSQL-specific metrics that help make decisions around managing a PostgreSQL cluster. These include:

pgBouncer

Info

The pgBouncer dashboard will only have relevant metrics when using the OpenTelemetryMetrics feature gate, available in CPK v5.8.0 and above. Check the OpenTelemetry observability page for more information.

The pgBouncer dashboards provides details from the PgBouncer metrics exposed by the OpenTelemetry collector sidecar. The OpenTelemetry collector sidecar is configured to query pgBouncer with the built-in SHOW command views found in the pgBouncer documentation. For instance, metrics prefixed with ccp_pgbouncer_pools_ are derived from pgBouncer's SHOW POOLS command. See pgBouncer documentation for more on those commands.

Metrics here can be filtered by PostgresCluster, by pgBouncer pod, and finally by database pool. These metrics/visualizations include:

Pod Details

Pod details provide information about a given Pod or Pods that are being used by a PostgreSQL cluster. These are similar to "operating system" or "node" metrics, with the differences that these are looking at resource utilization by a container, not the entire node.

It may be helpful to view these metrics on a "pod" basis, by using the Pod filter at the top of the dashboard.

Backups

There are a variety of reasons why you need to monitoring your backups, starting from answering the fundamental question of "do I have backups available?" Backups can be used for a variety of situations, from cloning new clusters to restoring clusters after a disaster. Additionally, Postgres can run into issues if your backup repository is not healthy, e.g. if it cannot push WAL archives. If your backups are set up properly and healthy, you will be set up to mitigate the risk of data loss!

The backup, or pgBackRest panel, will provide information about the overall state of your backups. This includes:

PostgreSQL Service Health Overview

The Service Health Overview provides information about the Kubernetes Services that sit in front of the PostgreSQL Pods. This provides information about the status of the network.

Query Runtime

Looking at the overall performance of queries can help optimize a Postgres deployment, both from providing resources to query tuning in the application itself.

You can get a sense of the overall activity of a PostgreSQL cluster from the chart that is visualized above:

PostgreSQL Operator Monitoring also further breaks down the queries so you can identify queries that are being executed too frequently or are taking up too much time.

Alerts

Alerting lets one view and receive alerts about actions that require intervention, for example, a HA cluster that cannot self-heal. The alerting system is powered by Alertmanager.

The alerts that come installed by default include:

Optional alerts that can be enabled:

You can modify these alerts as you see fit, and add your own alerts as well! Please see the installation instructions for general setup of the PostgreSQL Operator Monitoring stack.