RFC: Updating and Aligning the LLVM Release Process before LLVM 21 (original) (raw)
Hello LLVM community,
During the LLVM 20.x release cycle, @tstellar pointed out a difference between how we approached the RC process and what is documented in the official LLVM Release Process. You can find that discussion in this thread.
Summary of the Current Process
Here’s a quick summary of the currently documented process:
main
is branched intorelease/21.x
.- One week later,
RC1
is cut from roughly the same point, with some fixes potentially merged in during that week. RC2
is released two weeks later. During this time, eligible patches can be merged into the release branch.RC3
is optional and only produced if major blockers are found. In the case of LLVM 20.1.0, patches were pending but not merged into the release branch, and RC3 was skipped because none of the patches were considered “blocking.”- Minor releases (e.g., 20.1.1 through 20.1.6) follow every two weeks, including eligible fixes.
My Approach in LLVM 19.x
During the LLVM 19.x release, I followed a slightly different approach—closer to the Linux kernel model—where the number and scale of changes between RCs influences whether another RC is necessary. If a large volume of patches accumulates between RC2 and RC3, that signals ongoing instability, and I prefer to extend the RC phase (e.g., by doing RC3 and RC4) until patch flow slows and the tree stabilizes.
Why Change?
In my view, it’s better to invest more time in the RC phase, since the first final release is what most users will download and use. Many won’t upgrade until their distribution picks up a newer version, so the quality of the initial .0
release is critical. This also avoids the situation where, for example, LLVM 20.1.1 ends up being a “larger” update than the difference between RC2/3 and the final release.
Proposed Process Changes
I suggest we revise the release process as follows:
1. RC1 = Branch Freeze
- We continue to branch from
main
on schedule and tag RC1 one week later, as we do today.
2. RC2 and Beyond = Gradual Stabilization
- Patches accepted during RC1 → RC2 may include:
- Regression fixes
- Near-complete feature work
- Major correctness or performance improvements
- With each subsequent RC, patch acceptance becomes more restrictive:
- RC3–RC4: Only bug fixes and regressions
- RC5+: Only critical regressions or blockers
- The release manager, with input from the community, will determine whether additional RCs are needed. We won’t cap the number of RCs in advance, though RC5+ should be rare.
3. Release Happens When Ready
- Rather than locking into a strict 6-week stabilization window, we will aim to release when the branch is stable and the patch flow has slowed.
- Increase the RC phase to minimum 8 weeks but only keep it as guideline that might be extended if warranted by incoming changes.
4. Improved Communication
- The release manager will provide regular status updates after RC1 to report:
- Current state of the release branch
- Tentative schedule for next RCs or final release
- Whether we’re entering the “strict phase” (RC4+)
5. Post-Final Releases
- Minor versions (e.g., 21.1.1 through 21.1.6) will still be released every other week.
- Accepted patches will follow similar criteria as the RC3–RC4 phase (i.e., bug fixes and regressions only).
Benefits
- A more stable and reliable
.0
release - Better alignment between release managers
- Clearer expectations for patch acceptance at each RC stage
- Flexibility to respond to actual stabilization needs rather than arbitrary timelines
I hope we can reach agreement on this approach before the LLVM 21.x release cycle begins in early July. Feedback and discussion are very welcome!
Thanks,
Tobias
nikic May 23, 2025, 11:01am 2
If I understand correctly, your proposal still operates under the premise where the .0
release is somehow special. That is, you become more stricter with accepted patches as the RC phase goes on, and then become less strict again after the .0
release. This general approach just doesn’t make sense to me. We should be treating the .0
release exactly the same way as the RC releases and following point releases.
My reasoning is as follows: If a backported patch fixes an issue, it is always better to fix it earlier rather than later. If a backported patch causes a regression, it is always better for that regression to happen earlier than later. If a regression is introduced between the last RC and .0, that’s still better than it being introduced between .0 and .1, which is better than it being introduced between .6 and .7. Our willingness to accept backports should be monotonically decreasing, without a discontinuity around the .0 release.
Now, to the point of the release schedule: Having a predictable release schedule is valuable, and other projects operate around the current schedule. For example, if you move the .0 release by four weeks due to a longer RC phase, that’s probably going to delay new LLVM versions in Fedora by a whole 6 months.
If we’re going to extend the RC phase, it would be nice to do it to the left rather than the right, i.e. move the branching point and first RC earlier.
I’m also not entirely sure it makes sense to try and judge “actual stability” as opposed to having a fixed timeline. Thanks to how easy backports have become thanks to process improvements, I think we’re now much more liberal with backports for minor issues than we used to be – e.g. we’ll backport pretty much all miscompile fixes as a matter of course, even if we don’t have any expectation of them being relevant in practice. The 20.1.6 release contained 30 new patches relative to 20.1.5. Is that stable? Is that unstable? Where and how would we draw the line?
I for one welcome this proposal. I find it strange that releases get cut on such a strict schedule that they may contain known severe bugs, or be cut despite finished PRs to fix those issues which haven’t been merged.
While I agree with @nikic’s point about the “willingness to accept backports should be monotonically decreasing”, that’s not in conflict with ensuring a .0
release without known major issues.
People are probably familiar with how GCC does it[1], which is to have bug priorities, and a release condition is that there are no outstanding bugs of the highest priority (so either such a bug needs to be fixed or its priority downgraded).
Perhaps a simpler solution (that avoids having to assign a priority to everything) would be to let maintainers / code owners add a release blocker
label[2].
- not that that’s the only way to achieve it, but it’s an intuitive approach, given the goal ↩︎
- or some variation like using
release blocker candidate
, which then gets upgraded once relevant people agree it should be a blocker ↩︎
Endill May 25, 2025, 8:52am 4
I’ve seen multiple decade-old issues which were given this treatment, only to be ignored in the end. Unless there are maintainers willing to commit time and labor to address them in a timely manner, “release blocker” status should come with an expiration date, to ensure that release is not jeopardized.
alexrp May 25, 2025, 9:52am 5
Some thoughts from the Zig side:
It’s perhaps interesting to note that LLVM 19 was the only LLVM release in recent memory where Zig was actually able to upgrade to the .0
release; with every other release, we’ve had to wait for multiple patch releases that include important regression fixes. As a result, before that point, I think Andrew had basically given up on aiming for .0
upgrades.
For LLVM 20, I had obtained commit/triage access for LLVM. So that time around, I filed issues that affected us very early in the release process and added them to the LLVM 20.X Release
milestone. To the credit of all involved, every issue I filed like this was resolved and a fix backported, so zero complaints from us in that regard. However, we did still end up having to wait for 20.1.2 to pull the trigger on the upgrade because not all fixes made it in in time for 20.1.0.
Obviously we don’t have an amazing sample size here, but I do get the sense that there was something about the LLVM 19 release process that resulted in a more stable .0
release. Just some food for thought here.
nikic May 25, 2025, 4:46pm 6
Thanks a lot for testing LLVM releases proactively and reporting issues. Zig seems to be a great stress-test for LLVM, finding issues we don’t encounter elsewhere.
I can make a few guesses on why not all fixes made the LLVM 20.1.0 release:
- There was one less RC release than for most other releases.
- The special treatment of the
.0
release I mentioned above. The diff between rc3 and .0 was basically empty, because only “critical” fixes were accepted. So effectively that’s two wasted weeks in the release cycle (and why I’m arguing against this approach). - Please do correct me if I’m wrong, but I believe your general approach is to test a release candidate, report an issue you encountered (or multiple independent issues), wait for it to be fixed and released and then test again with the next release candidate. So even though the six issues you reported for the initial update were all fixed in ~1 day on average, you still have to wait out the 14 days between RCs, which drags things out a lot. When integrating new LLVM versions in Rust, we approach this differently, by immediately backporting patches and trying again. This allows us to have a working integration already at rc1/rc2. What your approach would probably benefit from most is to have more RC releases in the same time window.
alexrp May 25, 2025, 5:15pm 7
That’s right. Ideally we’d do the same thing Rust does, but we don’t quite have the resources for that to be practical at the moment, so testing RCs (and patch releases) is the best we can do.
I agree that that more RCs would be beneficial for Zig. I think an RC a week would be the sweet spot for us, FWIW.
Agreed that it makes no sense on old issues, and an expiration date also sounds good to me. What I’m talking about are regressions that are discovered during the RC phase or just before, with a big enough blast radius that it makes sense to wait.
One other reason is differences in style / philosophy between the respective release managers (@tobiashieta for odd releases, @tstellar for even ones), and aligning those is what this RFC is about. I think your experience can be counted as supporting evidence for Tobias’ proposal.
fhahn May 26, 2025, 1:44pm 10
Thanks for sharing this proposal, it would be great to clearly specify the requirements for back-ports at different stages
jh7370 May 27, 2025, 7:36am 11
I was just randomly thinking about this and @nikic’s comments re the “special” .0 release and missing distro releases by taking too long. Here’s a perhaps slightly wacky and more radical alternative: don’t have release candidates at all. The .0 release is then the equivalent of the first RC after branching and we continue to produce new releases (.1, .2 etc) every couple of weeks, but significantly, we don’t label any given release as “final” specifically. This always allows us to produce another build for a given release branch, if we feel it is warranted and eventually we’d tail off and stop producing more releases. Distros can then choose to grab LLVM at any arbitrary point in this cycle, depending on their own timelines and confidence in stability.
For this to work, we’d need a clear list of known significant issues with each point release, which is updated if issues are discovered post release. This would go hand-in-hand with release notes that clearly indicate which point release any given fix is in.
As a partial aside, I believe a natural consequence of this is that we don’t then need dedicated patch releases. They’re just another release from the branch.
nikic May 27, 2025, 7:51am 12
The thing that distinguishes the RC releases from the patch releases is that the RC releases do not have stable ABI. For distro purposes, we can only ship releases with a stable ABI. However, having a stable ABI also means that some fixes cannot be backported (at least not without significant effort). As such, I do think we need an RC phase during which more intrusive, ABI breaking fixes can still land.
Reading @nikic post I agree with the following:
- I think we are way to lax about adding changes after .0 and we should probably be much more restrictive with the fixes that goes in with the final RC and then keep the same criteria for accepting patches for the rest of the life cycle.
- In order to get this to work I think we need to extend the RC period so that maintainers and developers have time to fix things that will no longer be accepted after we hit the more restrictive period.
In terms of the fixed date and flexible RC point, it seems like you think this is important to have a very predictable date. I don’t have that view at all, I think it’s more important that we have enough time to fix and merge issues before we put out a release, especially if we want to be super-restrictive about the patches we take after a point. That said, I am willing to listen to distributions and downstream vendors if this is very important for them.
With these things in mind - maybe a new timeline would look like this:
- Branching to RC1, basically anything is accepted at this point.
- RC1->RC3 - finish features, API/ABI breaks and major patches are still allowed.
- RC3->RC5 - Bug fixes, regressions, API breaks if needed are allowed. No ABI breaks, no major patches, no new features (at this point we rather people disable features not finished).
- .0 release and forward. Regressions, crash fixes, security fixes and important other bugs can be fixed, nothing else.
That would give maintainers roughly 12 weeks from branching to the initial release and 6 weeks of major fixes.
We can discuss the actual definitions of the acceptance criteria if we can agree on:
- Prolonging the RC period
- Making the acceptance criteria much more strict after RC3.
- Flexible or fixed RC period.
How does this proposal sound?
nikic May 27, 2025, 9:20am 14
This actually isn’t what I was going for It is one way to remove the discontinuity at the
.0
release, but what I actually had in mind was “be less strict about fixes at the end of the RC phase”. The main thing I want to avoid is things like the 20.1.0 release which only contained a single fix despite many pending patches, because they were not sufficiently critical.
I think an important thing to keep in mind here is that a lot of people only start testing new LLVM versions after the .0 release. Extending the RC phase and being more restrictive about patches afterwards does not help them. (In particular this is going to include most “users” of Clang, rather than people who integrate LLVM somewhere.)
Having a fixed date is important for Fedora at least. The current release timeline allows us to barely ship new LLVM releases (and only due to process exemptions). If the first ABI stable release happens later, we need to delay updating to a new LLVM version until the next Fedora version half a year later. I think this is a bad outcome not just for Fedora, but also the LLVM community at large (as it removes one of the big forcing functions to quickly support new LLVM versions in lots of random projects).
Oh well I think we have the same intention but maybe different ways to get there. I think your approach works as well, as long as we “open the floodgates” after the .0 version. How strict we should be is an important discussion as well. As long as the no API/ABI breakage is part of that.
As a hypothetical let’s call the acceptance criteria C4 for the most relaxed and C1 for the most strict. Then I could see a schedule like this:
Branch until RC1 - C4
RC1 til RC3 - C3
RC3, RC4 and all the way to 21.1.3 - C2
Anything after that C1.
Does that match up with what you would like to see as well?
This is true and a good point. I would love to have a way push people to test the RC but that doesn’t seem very likely. I do think a longer RC period probably wouldn’t hurt.
While I sympathize with this and we shouldn’t go out of our way to sabotage this, the same argument could probably be made from another dist / vendor in favor of pushing the release date in either direction. There is always going to be timing issues and I don’t think it’s realistic to try to accommodate this that much when having this discussion. That said - I don’t mind if we are extending the RC period that we do it in “the other direction”.
I know that @tstellar has expressed some concern about running two releases at the same time. I am not sure how he feels about that anymore.
Would it make more sense to shift the releases ~3 months earlier then? That way, the versions OSes pick up aren’t the .0
versions, and would instead pick up versions with more bugfixes.
nikic May 27, 2025, 3:00pm 17
Yeah, something along those lines sounds reasonable to me.
Sure. I’m not asking LLVM to accommodate our specific schedule. I mainly mention this for two reasons:
- Having a known date (or at least reasonably tight range) for the .0 release is important for planning. If we know in advance that the timing of the release will make it impossible for us to ship it, that’s one thing. But if we may or may not be able to ship it depending on how many RCs there’s going to be this time, that’s much more problematic.
- I believe that, if possible, it is best to not move the .0 release (or at least not move it to the right), because that’s the date LLVM distributors/consumers generally plan around. Moving around internal milestones (like the start of the RC phase) is less likely to be disruptive or have negative effects.
As Tobias already mentioned, there’s lots of different vendors. We shouldn’t make a big timeline shift to accommodate any specific vendor.
preames May 27, 2025, 3:31pm 18
I want to chime in and strongly support having releases stick to a fixed release date. From past experience, if you don’t, you start having serious problems as there’s always a push to get one more feature in, but no individual (or in our case organization) bears the cost of the late release. We really need to stick to a model where the release goes out on a (very close to) fixed schedule, and changes which aren’t ready don’t go in(*). I know it seems counter intuitive, but this actually results in higher software quality in the long run.
Note that I’m strictly speaking about the initial release date. There’s more flexibility on the timing of point releases, provided the quality bar is maintained.
(*) Or if they do, the blast radius is generally less than the whole project, and users not using that feature/backend/whatever continue to see stable releases, and that author/organization gets strongly negative feedback from their immediate user community.
The point Nikita makes about the ability to plan (in a distributed manner!) around releae dates is also really important.
I think the releases should act more as checkpoints along the stabilization process rather than something that represents the code quality of the project or the lack of ‘critical issues’.
Personally, I think we should we should lean even more into the time based release process and have just 2 release candidates for every release. This would gives us 5 weeks of ‘stabilization’, which I think is plenty of time.
Personally I find 5 weeks and only 2 release candidates very tight, if one aims for having the .0 release usable for one’s case.
I do the majority of my testing continuously on git main, so in principle, I’m ready for release whenever - so I’m probably better off than many downstreams like e.g. Zig. But there’s some set of tests that I used to run manually; I usually try to run those tests around RC1. (Side note; Thanks to github actions having windows on arm runners these days, I’m able to run those configurations continuously now as well, so this should be less of an issue for me now.)
In any case; whether it is me testing my manual cases, or a downstream language testing integrating the new release… Practically, one may get to testing RC1 within a week of it being tagged. If there’s some new breakage since the last release, one has to pinpoint the breakage, reduce it, possibly bisect it. Then figure out how to fix it. On a nontrivial issue this easily can take a week, if this isn’t one’s main occupation. Then one probably have to account for yet another week to have the fix reviewed. (And if one can’t figure out the right fix oneself, but need to ask for help from others, this can easily take yet another week.) So for a nontrivial bug, three-four weeks is probably the bare minimum time window for getting any issues fixed, provided that one can get on it right away.
So if we want to have a good .0, I would definitely prefer at least three release candidates, with a possibility to merge fixes between RC3 and .0 as well.