Significant increase in Gitaly single node incidents (#23532) · Issues · GitLab.com / GitLab Infrastructure Team / Production Engineering · GitLab (original) (raw)
Significant increase in Gitaly single node incidents
We've had a huge increase in the number of single node Gitaly incidents in April, .
If you look at the past six months worth of apdex, you can see a significant change the third or fourth week of March.
We also see an increase in CPU usage and `schedstat_waiting
Incidents by category in May
To look at April please check Reliability::Practices April 2023 service avail... (reliability-reports#166 - closed)
- cgroups:
- pack-object/git spawn contention:
Action Items
- @qmnguyen0711: Fix cache key for
pack-objects
👉 gitlab-org/gitaly#5087 (closed)- 2023-05-04: Blocked by security release: gitlab-org/gitlab!119574 (closed)
- Investigate Apdex calculation 👉 scalability#2319
- @steveazz: Rate limits:
ListCommitsByOid
👉 https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23570CommitDiff
👉 https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23571projects/explore
👉 https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/5693/diffs- Pack Objects limiting 👉 gitlab-org/gitaly#4413 (closed)
- @steveazz: cgroups
- Discuss using a single parent cgroup 👉 https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23532#note_1377531701