Split resource_label_events table (#412705) · Issues · GitLab.org / GitLab · GitLab (original) (raw)

Skip to content

Split resource_label_events table

Summary

Due to the ongoing increased load issues on the primary database, we need to take action to reduce the size of significantly large tables, starting with tables > 100GB. As a table of 123.8 GiB, resource_label_events has been selected as a candidate.

As a follow-up from the discussions that happened in Consider partitioning strategies for resource_l... (#396809 - closed), and following a similar strategy to Consider partitioning strategies for descriptio... (#396805 - closed), we have agreed to the following implementation plan for partitioning resource_label_events;

  1. Prepare partitioned tables for each resource according to https://docs.gitlab.com/development/database/partitioning/hash/ (milestone M), number of partitions should be big enough to assure we don't hit the 100G limit soon again
  2. Create triggers which keep data between both tables in sync (milestone M, blocked by 1)
  3. Add background migration to copy records from resource_label_events to resource-specific table (milestone M, blocked by 2)
  4. Finalize background migration (milestone M+1 or maybe M+2 depending on estimated migration runtime)
  5. Update models which use description_versions to use new tables instead (milestone M+1, blocked by 3)
  6. Drop old resource_label_events table (milestone M+2 or later)

If we go with a partitioned table, it doesn't block us from doing further optimizations later, let's say next year or when there is a capacity for more experimentation (e.g. compressing or otherwise removing the reference_html field).

Notes

Edited Apr 02, 2025 by Max Orefice