Discrepancy between queries executed in thanos-query UI vs grafana · Issue #3859 · thanos-io/thanos (original) (raw)

Thanos, Prometheus and Golang version used:

Thanos image: thanosio/thanos:v0.18.0
Prometheus image: quay.io/prometheus/prometheus:v2.25.0
Grafana image: grafana/grafana:7.3.2

Object Storage Provider:

What happened:

We are experiencing a big discrepancy between a query executed directly in thanos-query UI and in a dashboard in grafana. Namely, a query executed in thanos-query UI takes around 1 second to complete and, the very same query, is 5/6 times slower in grafana.

Below, it is possible to see a screenshot from thanos-query UI where the query took 1049ms and a screenshot from grafana, with the same query, that took around 5 times more to complete and present results in grafana ui.

Screenshot 2021-02-26 at 16 23 57

Screenshot 2021-02-26 at 16 22 48

What you expected to happen:
The queries should take the same time to be executed both on thanos-query UI and grafana.

How to reproduce it (as minimally and precisely as possible):

Thanos-query and thanos-store configs:

Query:
- args:
- query
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --log.level=debug
- --log.format=json
- --query.auto-downsampling
- --query.replica-label=replica
- --store=thanos-sidecar.crm.svc.cluster.local:10901
- --store=thanos-sidecar.observability.svc.cluster.local:10901
- --store=thanos-store.crm.svc.cluster.local:10901

Note that the reason why we have two sidecars in store parameter, is because we have specific metrics stored only on "observability" prometheus and we also need to be able to access that info from thanos-query, on namespace "crm", while it is not shipped to S3.

Store:
- args:
- store
- --log.level=debug
- --log.format=json
- --objstore.config-file=/config/bucket.yml
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --data-dir=/var/thanos/store
- |-
--selector.relabel-config=
- action: keep
regex: "(crm|observability)"
source_labels:
- prometheus

Finally, below there is a print screen from grafana datasource:

Screenshot 2021-03-01 at 17 47 21

Full logs to relevant components:

Anything else we need to know: