Potential ABI break in 19.1.3 (original) (raw)

October 31, 2024, 2:25pm 1

Hello,

Unfortunately an oversight on my part made it so that a potential ABI break got merged into 19.1.3.

The PR in question is here - [LLVM] [Clang] Backport "Support for Gentoo `*t64` triples (64-bit time_t ABIs)" by mgorny · Pull Request #112364 · llvm/llvm-project · GitHub

We have a GitHub action that checks for ABI breaks, but this seems to have missed this case or have an exception for enums - it’s not clear. And I missed that the PR modified a public header.

The break was discovered by @alexrp and @andrewrk since it caused breaks in Zig - mea culpa.

The discussion about this can be read here: [LLVM] [Clang] Backport "Support for Gentoo `*t64` triples (64-bit time_t ABIs)" by mgorny · Pull Request #112364 · llvm/llvm-project · GitHub

What to do now?

We can either:

  1. Revert the change in 19.1.4, causing another soft ABI break and possibly release 19.2.4 with the new enum in it.
  2. Ignore this issue and just do better in the future.

I am currently leaning towards 2. But I am very open to feedback if this creates problems for other downstream users.

How to improve this in the future

I want to put in place a github action that warns the release manager if a PR is changing the public include files. In theory the abi verifier should catch this and make this action unnecessary but obviously that’s not the case here. If the abi action would have caught this I would have pushed back more on the PR before merging it.

Again - I am sorry that I missed this and created any issues around it.

Thanks,
Tobias

mgorny October 31, 2024, 2:40pm 2

For the record, there’s also a third option of reverting the enum change and instead mapping *t64 tuples to the existing enum values. This would avoid the need to create 19.2.x, at the cost of inconsistent behavior between clang 20.x (where *t64 triples imply -D_TIME_BITS=64) and 19.x (where they would merely not cause an error, as they did in 19.1.2).

alexrp October 31, 2024, 3:01pm 3

As noted on the PR, a revert wouldn’t really fix anything for us in Zig. That was true yesterday, but especially true today as we’ve just stopped using Triple.h enums in our LLVM usage (we have our own bitcode writer, so we didn’t actually need to anymore).

So at least on our end, there’s no pressing need for a revert, and so I think our vote, for what it’s worth, would be for option 2.

andrewrk October 31, 2024, 6:31pm 4

Thanks for the discussion, Tobias. I know being a release manager is like herding cats and you end up taking a lot of the heat from frustrated users. No hard feelings over here.

Accidents happen! I really appreciate that you’re proactively thinking about how to catch this sort of scenario as part of the release process in the future.

I’m weakly in favor of #2 unless this turns out to be significantly disruptive, but we could perhaps have a release note in 19.1.4 telling folks about the ABI mismatch coming from 19.1.3 and that it’s still present in 19.1.4.

MaskRay November 1, 2024, 4:43am 6

Adding new enum members at the end of an enum class is considered a safe practice and actually the only reasonable change for an evolving library (alternative: introduce a new enum type).
While it is not common to have a special “Last” member and in this case we bumped it and it could lead to potential pitfalls, I respectively disagree the statement that this is an ABI break.

Users should be mindful adding such change detector and adjust their code accordingly, not reading too much from what LLVM defines.

While I agree that we should exercise caution when making changes in releases, I’d like to reiterate that this is not an ABI break.

tstellar November 1, 2024, 4:56am 7

Our current policy is that all public symbols are part of the ABI, so technically this is an ABI break. The case where this would cause problems is if you have a library that is linked against LLVM 19.1.2 and has a function like

const char *getEnvName(llvm::Triple::EnvironmentType type) {
   const char *names[llvm::Triple::LastEnvironmentType] = {
      ...
   };
  return names[type];
}

Any caller that passes one of the new enum values to this function will segfault.

Now, I highly doubt there exists a function like this somewhere, so we are probably fine just keeping things as is, but one way to fix this would be to clarify that these “Last” enum members are not considered part of the ABI and may change from release to release.

rnk November 1, 2024, 6:37pm 8

I think it’s widely accepted that changing the value of an existing enum constitutes an ABI break. The LastEnvironmentType sentinel does exist, and this backport changed it, so that’s probably against our policy and something we should seek to avoid in the future.

More to the concern from Zig, LLVM uses a lot of “closed” enums, where the recommended style is to switch, handle all cases, and assert on unknown enums. Adding new enums can break existing switches following LLVM’s own style guide to use no-default fully covered switches. If we wanted to improve the situation here, I believe there have been some proposals floating around to add attributes to explicitly mark enums as open, closed, flags, or other modes I may have missed. If we had tags like that, it would make cases like this easier to reason about.

It sounds to me like no-one is leaning towards that we revert the ABI break for 19.1.4 but that we definitely should try to avoid these kinds of changes for future releases.

So I suggest:

Thanks to everyone chiming in.

I would like to point out that such an ABI break would occur even without the LastEnvironmentType sentinel. The size could for instance be hardcoded to 43, respectively have their own definition of the enum. The fact that an LLVM function (e.g. llvm::Triple::getEnvironment()) could return an enum value outside the original range already might result in a buffer overflow or other unhandled behaviour.

mgorny November 5, 2024, 2:06pm 11

Perhaps the key point here is to clarify what assumptions should people make with regards to enum values — in particular whether LLVM guarantees that no new values are going to be added within a release, or whether programs should be expected to check for values outside the known range, e.g.:

switch (foo) {
  case A:
    ...
  case B:
    ...
}

// not default: to trigger warnings about unhandled values
if (foo > LastValue)
  ...;