Triage Operations (original) (raw)

Automation and tooling for processing un-triaged issues at GitLab

Any GitLab team-member can triage issues. Keeping the number of un-triaged issues low is essential for maintainability, and is our collective responsibility.

We have implemented automation and tooling to handle this at scale and distribute the load to each team or group.

Video introduction to triage operations, triage report, priority and severity labels.

Accountability

The Quality Engineering Department ensures that every Product and Engineering group is held accountable to deliver on the SLA set forth.

Our defect SLA can be viewed at:

The Quality Engineering department employs a number of tools and automation in addition to manual intervention to help us achieve this goal. The work in this area can been seen in our department roadmap under Triage and Measure tracks of work.

Label renaming

There is a large amount of automation that uses stage, group, and category labels. We ask that Product Managers create an issue in triage-ops when any of the following changes occur. This issue helps ensure limited to no impact to automation and reports.

Auto-labelling of issues and merge requests

Our triage bot will automatically infer section, stage, and group labels based on the category/feature already set on an issue or MR. This is available for open issues/MRs within the gitlab-org group.

The most important rules are:

The following logic was initially implemented inthis merge request:

graph TB; A{Stage label
is present?} -- Yes --> B; B{Group label
is present?} -- Yes --> D; B -- No --> E; D{Group has
one category?} -- Yes --> X9[Set category label.]; D -- No --> X1[Nothing to do.]; E{Group is detected based on category labels
with a match rate > 50% among
all category labels?} -- Yes --> H; E -- No --> K; H{Does detected group label
matches stage label?} -- Yes --> X2[Set detected
group label.]; H -- No --> K; K{Several potential groups in
current stage detected
from category labels?} -- Yes --> X3[Manual triage
required.]; K -- No --> L; L{Does the stage has
a single group?} -- Yes --> X4[Set this
group label.]; L -- No --> X5[Manual triage
required.]; A -- No --> C; C{Group label
is present?} -- Yes --> F; F{Group has
one category?} -- Yes --> X10[Set stage and category labels
based on group label,
we're done!]; F -- No --> X6[Set stage label
based on group label,
we're done!]; C -- No --> G; G{Group is detected based on category labels
with a match rate > 50% among
all category labels?} -- Yes --> X7[Set group and
stage labels.]; G -- No --> X8[Manual triage
required.];

After the above inference is done, a section label will be added based on the stage or group label. An explanation will not be added in this step if the inferred labels contain only a section label.

Check out thelist of actual use-casesto better understand what this flow means in practice.

If your issue/MR doesn’t belong to a particular stage, you can remove the stage label and add the ~"automation:devops-mapping-disable" label to prevent this automation from happening in the future.

Triage reports

A triage report is an issue containing a checklist of issues or merge requests requiring attention. Usually, each task corresponds to an issue or a merge request that needs labels, prioritization, scheduling, attention etc. Some reports also include heatmaps or other various information.

Triage report are automatically assigned to specific team members, listed inthe stages definition file.

To change who an issue gets assigned to, open a merge request for the above files. If the group definition file is changed, we’ll need torun some scriptsto update the generated files as well.

These reports are owned by the Contributor Success team.

This report contains community merge requests requiring partial triage. The goal is for coaches to add type, stage, and group labels, so that the relevant people can be pinged later on based on these labels.

This report contains community merge requests that may require some attention from GitLab team members.

Team reports

Group level bugs, features, and Deferred UX

This report contains the relevant bugs, feature requests, and Deferred UX issues that belong to a group in our DevOps stages. The goal is to achieve complete-triage by the Product Manager, Engineering Manager, UX team member in that area.

The report itself is divided into 4 main parts.

The bug sections also contains a heatmap.

heatmap.png

An example: https://gitlab.com/gitlab-org/quality/triage-ops/issues/118

Video overview of the triage report.

There is also an optional stage policy for missing categories. If your team has enabled this, you will receive a list of up to 100 items that have the stage label but have zero appropriate category labels for that stage.

Feature proposals

This section contains issues with the ~"type::feature" label without a milestone. It is divided further into issues with and without ~"customer"

Frontend bugs

This section contains issues with the ~"type::bug" and ~"frontend" labels without priority and severity. It is divided further into issues with and without ~"customer"

Non-frontend bugs (likely backend)

This section contains issues with the ~"type::bug" label without priority and severity. It is divided further into issues with and without ~"customer"

severity::1 & severity::2 bugs past SLO

This section contains bugs which has past our targeted SLO based on the severity label set. This is based on our missed SLO detection triage policy.

Heatmap for ~customer bugs

This section contains a table displaying the open issues for a group labeled with ~"customer" and ~"bug". There is a breakdown by the assigned severity and priority labels

Group level merge requests that may need attention

This report contains idle group merge requests authored by GitLab team members.

Merge requests are considered idle when they have no human activity for 28 days. This report collects them for prompting of any actions to move the MR forward, such as nudging the author, reviewer, or maintainer.

An example report: Merge requests requiring attention for group::access - 2020-11-08. Current reports can be found in the triage-reports project

Group level feature flags that may need attention

This report contains feature flags that have enabled in the codebase for 2 or more releases for groups within our DevOps stages.

The DRI is responsible for reviewing these feature flags to determine if they are able to be removed entirely, or create separate issues to ensure the overdue feature flags are removed accordingly.

An example report: Feature Flags requiring attention for group::continuous integration - 2021-03-01. Current reports can be found in the triage-reports project

The feature flag triage reports are generated in a quality toolbox scheduled pipeline with the gitlab-feature-flag-alert project.

Group level Bug Prioritization report

This report contains group level the Top 10 open issues of ~"type::bug" which needs to be prioritized for the upcoming milestone. It is divided further into issues with ~"severity::, ~"bug::vulnerability" and ~"customer" labels and listed based on the oldest age of the issues

An example report: 2023-11-01 - Bugs Prioritization for “group::source code” for upcoming milestone - 16.7. Current reports can be found in the triage-reports project

Auto closure of triage reports

Reports open for more than 2 weeks with the ~"triage report" label will be closed automatically with the close old triage reports automation.

Reactive workflow automation

Reactive triage automation is complementary to scheduled triage automation where realtime feedback provides an improved developer experience. This is handled bytriage-ops.

Note: reactive command arguments between brackets ([]) are considered as optional.

Following is a diagram that shows how all the automations fit together:

graph LR classDef triageOpsClass fill:#FC6D26,stroke:#333,stroke-width:3px;

MR_INITIAL(["Wider Community Merge request<br />(author is not a member of `gitlab-org`)"])
MR_COMMUNITY(["Merge request with the `Community contribution` label"])
MR_OPENED[MR is opened]
MR_UPDATED[MR is updated]
MR_MERGED[MR is merged]
MR_CLOSED[MR is closed]
MR_AUTHOR_NOTE[MR author posts a note]
ANYONE_NOTE[Anyone posts a note]
AUTOMATED_THANK(["1. Post a 'Thank you' note<br/>2. Add the `Community contribution` label<br />3. Add the `workflow::in dev` label<br />4. Assign MR to its author"])
WORKFLOW_READY_FOR_REVIEW_LABEL{"Was the<br />`workflow::ready for review`<br />label added?"}
AUTOMATED_REVIEWER_REQUEST_GENERIC(["If reviewers are present, ask them to review.<br />Otherwise, ask (and assign) an MR coach<br />(selected based on group label) to review"])
AUTOMATED_REVIEW_DOC{"Does the MR touches<br/>documentation files?"}
AUTOMATED_REVIEWER_REQUEST_DOC(["Post a note asking a<br />technical writer to review"])
AUTOMATED_REVIEW_UX{"Does the MR has<br />the `UX` label?"}
AUTOMATED_REVIEWER_REQUEST_UX(["Post a message in the<br />`#ux-community-contributions`<br />Slack channel, and on the MR"])
AUTOMATED_FEEDBACK_REQUEST(["Post a note asking<br />for feedback"])
AUTOMATED_HACKATHON_LABEL{Is a Hackathon<br />currently running?}
AUTOMATED_HACKATHON_LABEL_ADDITION(["Add the `Hackathon` label"])
WHAT_AUTHOR_NOTE{What note is it?}
WHAT_ANYONE_NOTE{What note is it?}

AUTOMATED_LABEL_COMMAND_REPLY(["Add the requested label"])
AUTOMATED_HELP_COMMAND_REPLY(["Ask (and assign as reviewer)<br />an MR coach for help"])
AUTOMATED_REVIEW_COMMAND_REPLY(["Add the `workflow::ready for review` label"])
AUTOMATED_FEEDBACK_COMMAND_REPLY(["Post the feedback in the<br />`#mr-feedback` Slack channel"])

MR_INITIAL -.-> MR_OPENED
MR_COMMUNITY -.-> MR_UPDATED & MR_MERGED & MR_CLOSED & MR_AUTHOR_NOTE & ANYONE_NOTE

MR_OPENED ----> AUTOMATED_THANK
MR_UPDATED -.-> WORKFLOW_READY_FOR_REVIEW_LABEL
MR_UPDATED -.-> AUTOMATED_HACKATHON_LABEL
MR_MERGED & MR_CLOSED ----> AUTOMATED_FEEDBACK_REQUEST
MR_AUTHOR_NOTE -.-> WHAT_AUTHOR_NOTE
ANYONE_NOTE -.-> WHAT_ANYONE_NOTE

WORKFLOW_READY_FOR_REVIEW_LABEL ---> |Yes| AUTOMATED_REVIEWER_REQUEST_GENERIC
WORKFLOW_READY_FOR_REVIEW_LABEL -.-> |Yes| AUTOMATED_REVIEW_DOC & AUTOMATED_REVIEW_UX
AUTOMATED_REVIEW_DOC -->|Yes| AUTOMATED_REVIEWER_REQUEST_DOC
AUTOMATED_REVIEW_UX -->|Yes| AUTOMATED_REVIEWER_REQUEST_UX
AUTOMATED_HACKATHON_LABEL --->|Yes| AUTOMATED_HACKATHON_LABEL_ADDITION

WHAT_AUTHOR_NOTE --->|"@gitlab-bot label ..."| AUTOMATED_LABEL_COMMAND_REPLY
WHAT_AUTHOR_NOTE --->|"@gitlab-bot feedback"| AUTOMATED_FEEDBACK_COMMAND_REPLY

WHAT_ANYONE_NOTE --->|"@gitlab-bot help"| AUTOMATED_HELP_COMMAND_REPLY
WHAT_ANYONE_NOTE --->|"@gitlab-bot ready"| AUTOMATED_REVIEW_COMMAND_REPLY

class AUTOMATED_THANK,AUTOMATED_LABEL_COMMAND_REPLY,AUTOMATED_HELP_COMMAND_REPLY triageOpsClass;
class AUTOMATED_REVIEW_COMMAND_REPLY,AUTOMATED_FEEDBACK_REQUEST,AUTOMATED_REVIEW_DOC triageOpsClass;
class AUTOMATED_REVIEW_UX,AUTOMATED_REVIEWER_REQUEST_DOC,AUTOMATED_REVIEWER_REQUEST_UX triageOpsClass;
class AUTOMATED_FEEDBACK_COMMAND_REPLY,AUTOMATED_HACKATHON_LABEL triageOpsClass;
class AUTOMATED_HACKATHON_LABEL_ADDITION,WHAT_AUTHOR_NOTE,WHAT_ANYONE_NOTE triageOpsClass;
class WORKFLOW_READY_FOR_REVIEW_LABEL,AUTOMATED_REVIEWER_REQUEST_GENERIC triageOpsClass;

Automated review request

Automated review request for doc contributions

Automated review request for UX contributions

Reactive help command

Reactive ready command

Reactive unassign_review command

Reactive label and unlabel commands

Idle/Stale label remover

Code Review Experience Feedback

Reactive feedback command

Leading Organizations labeler

Hackathon labeler

Spam detector

Engineering workflow automation

Ensure priorities for availability issues

For issues labeled ~"availability", the minimal are enforced with the guidelines athttps://handbook.gitlab.com/handbook/engineering/infrastructure/engineering-productivity/issue-triage/#availability-prioritization

Ensure no deprecated backstage labels are added

Whenever ~"backstage [DEPRECATED]" is added, it’ll remove it and hint about why it should not be added, and alternatives will be provided.

The ~"customer" label is applied when a customer associated link is applied.

The following URLs are considered customer associated links:

Add type label from subtype

Whenever a subtype label is added, the corresponding type label is added. Current type labels with subtype labels are:

Reactive retry_job command

Reactive retry_pipeline command

Reactive delete_bot_comment command

Database Review Experience Feedback

Scheduled workflow automation

Scheduled triage automation is run to label and update issues which help with reporting and milestone transition. This is handled bytriage-ops.

When an issue is assigned, it shouldn’t accept any new contribution to prevent duplicated work.

When an issue has the Seeking community contributions label set, but also an incompatible workflow label, the issue isn’t actually ready to accept a contribution.

It doesn’t make sense to have Seeking community contributions set on merge requests.

Merge requests which have an author that is not a member of gitlab-org will have the Community contribution label applied. This scheduled automation is a backup for the reactive automation that applies Community contribution in the welcome message.

Merged merge requests with the Community contribution label and no milestone will automatically get the relevant milestone set. This helps keep the community contributions numbers accurate.

Engineering workflow automation

Milestone reschedule

Open issues and merge requests that have missed the current release will be rescheduled to the next active milestone. This identifies pending work that was not completed within the planned milestone.

Note: Confidential issues will be skipped as part of the missed label application. Please see the this issue for more information

Missed deliverable

Open issues and merge requests planned as ~Deliverable but have a ~missed:x.ylabel will have the ~missed-deliverable label applied.

Note: Confidential issues will be skipped as part of the missed label application. Please see the this issue for more information

Deliverable with no milestone

Issues which have a label of ~Deliverable without a milestone will have the milestone set to %Backlog.

Missed SLO

Issues which have a severity label and missed the SLO target will be labeled with ~missed-SLO. The calculation for elapsed time starts from the date of the severity label was applied. This enables reporting on SLO target adherence.

Bug priority label inference

Bugs which have a severity 1 or severity 2 label without a priority label will be labeled with the equal priority label. For example, a ~severity::1 ~"type::bug" without a priority label will have ~priority::1 applied.

Master broken categorization

Issues or merge requests that have a label of ~"master:broken" will have labels of ~"priority::1" and ~"severity::1" applied. This ensures that requests which break master are sufficiently categorized for reporting.

Identify interesting feature proposals

This automation identifies potential and popular proposals using upvotes. This helps identify feature proposals that people have indicated they would like.

Auto-close inactive bugs

GitLab values the time spent by contributors on reporting bugs. However, if a bug remains inactive for a very long period, it will qualify for auto-closure. The following is the policy for identification and auto-closure of inactive bugs.

Prompt for Tier labels on issues

Tier labels should be applied to issues to specify the license tier of feature. This policy prompts the Product Manager for the applied group label to add the license tier label to issues that are scheduled for the current milestone and labeled with ~direction.

The possible tier labels to be applied are:

Prompt for Type labels on issues

Type labels are applied to issues to increase the visibility and discoverability during team issue refinement. This policy applies to gitlab-org team member created issues and prompts the author to apply a type label to the issue within the first week.

Type labels ensure that issues are present in the group triage report and added to the correct section.

Data

Bug SLO Warning

Bugs have a severity label that indicates the SLO for a fix. This automated policy aims to prompt managers about bugs in their group that are approaching the SLO threshold

Reminder on ~infradev issues to set severity label, priority label, and milestone

Issues with the ~infradev label should have a severity label, a priority label, and a milestone set. This automated policy aims to prompt managers about such issues missing one of these attributes.

Note:

  1. The ~"automation:infradev-missing-labels" is automatically removed when a severity label, a priority label, and a milestone are set on the issue.
  2. The ~"automation:infradev-missing-labels" is automatically removed after two weeks, leading to a new message being posted if the Automation Conditions above are still met. This effectively ensures that a reminder is posted on the issue every two weeks.

Reminder on ~customer ~type::bug issues to set severity label

Issues with the ~customer and ~type::bug labels should have a severity label set. This automated policy aims to prompt team members to set a severity so that ~customer bugs are triaged in a timely fashion.

Note:

  1. The ~"automation:customer-bug-missing-labels" is automatically removed when a severity label is set on the issue.

Resources


Onboarding guidelines for Issue Triaging team