Significant increase in Gitaly single node incidents (#23532) · Issues · GitLab.com / GitLab Infrastructure Team / Production Engineering · GitLab (original) (raw)

Skip to content

Significant increase in Gitaly single node incidents

We've had a huge increase in the number of single node Gitaly incidents in April, .

https://gitlab.com/gitlab-com/gl-infra/production/-/issues/?label_name%5B%5D=a%3AGitalyServiceGoserverApdexSLOViolationSingleNode

If you look at the past six months worth of apdex, you can see a significant change the third or fourth week of March.

Screenshot_2023-04-26_at_2.13.54_PM

source

We also see an increase in CPU usage and `schedstat_waiting

CPU usage forcast

source

schedstat_waiting forcast

source

Incidents by category in May

To look at April please check Reliability::Practices April 2023 service avail... (reliability-reports#166 - closed)

Action Items