Which code review metrics should you track and analyze? (original) (raw)

Code review is a vital practice for improving the quality, security, and maintainability of software projects. It involves checking the code changes made by other developers before they are merged into the main branch or released to production. But how do you measure the effectiveness and efficiency of your code review process? What are some of the metrics and indicators that you can track and analyze to optimize your code review performance and outcomes? This article explores some of the most useful code review metrics and indicators that you can use to monitor and improve your code review skills.

Review Participation

Review participation is the ratio of reviews performed by a developer to reviews requested by a developer. It is a metric that reflects how active and collaborative a developer is in the code review process. A high review participation means that a developer is contributing to the code quality and knowledge sharing of the team by reviewing other developers' code changes and providing constructive feedback. A low review participation means that a developer is either too busy, too reluctant, or too isolated to engage in the code review process and benefit from the peer learning and improvement opportunities. You can calculate the review participation by dividing the number of reviews performed by a developer by the number of reviews requested by the same developer in a given period.

Review Participation is a vital metric for understanding the collaborative dynamics and overall health of a development team. - Encourages team members to share knowledge and best practices. - Promotes a sense of ownership and accountability among developers. - Diverse perspectives help in identifying bugs and potential improvements. - Constructive feedback leads to continuous learning and skill enhancement. - Prevents overburdening of a few developers with review tasks. - Streamlines the code review process, reducing bottlenecks.
Review speed measures the time from a pull request (PR) submission to its review completion. To analyze it, track the time to the first review and the total review duration, including change cycles. Delays often highlight issues like large PRs, unclear requirements, or scheduling conflicts. Use tools like GitHub Insights or custom scripts to track trends and identify the gaps, enabling teams to improve the review process effectively.
This metric is crucial in larger codebases where `CODEOWNERS` have different teams and it take multiple reviewers to get it pushed. I do find it important to remove some of the top results to account for automation and people who are reviewing "too quickly" (as in not actually reviewing but giving an `LGTM` which requires a separate conversation). One of the things we recently did at our org to improve this metric was to allow a comment on the pull request `BOTNAME coderereview` which would get the list of required `CODEOWNERS`, lookup their slack channel, and post a request for review.
Review Depth is a metric that measures the level of scrutiny which is applied during code reviews. It focuses on how thoroughly the code is examined, ranging from checking for simple syntax errors to evaluating overall design, performance, and maintainability. This ensures that all necessary aspects, such as logic correctness, code style, and potential technical debt, are being addressed during the process. This helps to not only identify issues but also improves code quality over time by ensuring that each change is consistent with the project's standards and long-term objectives.
I find this to be an interesting metric, sometimes I find it hard to make sense of it without zooming in. I would generally say that even engineers who are very thorough have two kinds of reviews. This usually centers around the perceived complexity, testing, and trust that the author understands the impact of their changes. Sometimes we get massive deltas that really have little or no impact. If we trust the tests and the author it can be a simple approval (which is a state of a comment review) indicating such and not that you have given it a through review. When we and the author agree about the complexity we give it a thorough review, this is where this metric does very well in quantifying the original points of the author.
Review Quality ensures that code reviews effectively catch issues and maintain standards. It can be easily mapped by evaluating the relevance and depth of review comments—whether they address bugs, design concerns, or guideline adherence. Comment density, which measures the number of comments relative to the lines of code, can indicate whether the feedback matches the complexity of the changes. Tracking how often code requires revisions post-review also highlights gaps in thoroughness. This can ensure reviews remain consistent and meaningful.
I love these "squishy metrics" that are little more than user experience interviews in various forms as they are less susceptible to be "gamed".

Review Impact

Review impact is the extent to which the code review process influences the final outcome and performance of the software project. It is a metric that evaluates the value and benefits of the code review process for the software product, the users, and the stakeholders. A high review impact means that the code review process has helped to reduce the number of bugs, errors, or vulnerabilities in the software, increase the software functionality, usability, and reliability, and enhance the user satisfaction, retention, and loyalty. A low review impact means that the code review process has not made a significant difference or improvement to the software quality, security, or maintainability, or has even introduced new problems or issues that affect the software functionality, usability, or reliability. You can measure the review impact by using different methods and sources, such as testing, debugging, monitoring, analytics, feedback, or reviews, that track and assess the software quality, security, and maintainability metrics and indicators.

In my experience this is the most important aspect even if I am not sure I know of an easy and good metric to measure this TBH. At my organizations we use pull request templates to help the reviewer outline what the impact is. This reduces the burden on the reviewer to divine what the intended impact is vs what the actual impact the reviewer sees from the deltas and other artifacts on the pull request.

Thanks for your feedback

Your feedback is private. Like or react to bring the conversation to your network.

Report this article

See all