[RFC] Add CallBr Intrinsic Support (original) (raw)

CallBr is currently used to provide branch targets to inline ASM calls. This RFC proposes to also allow selected intrinsics as call targets.

Current State of CallBr

The ‘callbr’ instruction causes control to transfer to a specified function, with the possibility of control flow transfer to either the ‘fallthrough’ label or one of the ‘indirect’ labels.
This instruction should only be used to implement the “goto” feature of gcc style inline assembly. Any other usage is an error in the IR verifier.
[…]
callbr void asm "", "r,!i"(i32 %x) to label %fallthrough [label %indirect]

(LLVM Language Reference Manual — LLVM 21.0.0git documentation)

Problem

There are some control-flow affecting target intrinsics, such as llvm.amdgcn.kill, which act and are implemented as terminators. However, this is currently not reflected by the IR, which leads to the issue that the EXEC mask of the currently executing lane is manipulated by an instruction which is not handled as a terminator. Consequently, the containing block currently needs to be split during compilation.
Another example is the llvm.amdgcn.cs.chain intrinsic, which must be followed by an unreachable instruction because it’s actually a terminator as well. However, currently it’s not possible to properly enforce this intrinsic to be followed by unreachable. The current workaround is to check this in the IR Verifier but that does not guarantee anything for the following stages of the compilation. This results in the need for a construct that allows intrinsic calls to be handled as terminators. Also see [AMDGPU] Enhance verification of amdgcn.cs.chain intrinsic by ro-i · Pull Request #128162 · llvm/llvm-project · GitHub

Proposed Solution

Allow CallBr to have such intrinsics as call targets. This would look like this (the order of the blocks kill and cont after the CallBr instruction doesn’t matter):

  callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
  unreachable
cont:
  ...

(Note that %c == false would lead to kill and %c == true to cont since %c represents the value which is written to EXEC.)
A solution like this would properly reflect the semantics of certain target intrinsics as terminators and make the control flow explicitly visible.

See [IR] Add CallBr intrinsics support by ro-i · Pull Request #133907 · llvm/llvm-project · GitHub for the current draft PR.

nikic April 29, 2025, 10:24am 2

Allowing (whitelisted) intrinsics in callbr generally sounds reasonable to me.

For the specific examples listed, these both sound like intrinsics that can be modeled as normal non-willreturn calls, which permit control flow to implicitly diverge at the call. Why is that not sufficient? Is it just that a matter of the backend not having good support for non-willreturn intrinsics, or something more?

Basically, any call call void @fn() where @fn is not willreturn is really a callbr void @fn to label %cont [label %unreachable] (as in your llvm.amdgcn.kill example), this is just not made explicit in the CFG, but all analyses/transforms still not to treat it as such. And at least in the middle-end we are pretty good at this nowadays, but I’m not sure what the current backend state is.

arsenm April 29, 2025, 3:53pm 3

The problem is conceptually what is going on is beyond the scope of what is represented in the IR CFG. We think of the IR as presenting the single lane view of the program, but these have different behavior across the lanes. If you only consider this lane, it’s a not-willreturn call. The other lanes are willreturn and will continue on. This introduces mechanical edge cases we need to deal with in the backend. We don’t want to have to deal with the instructions between the exit and the “real” branch terminator at the end of the block.

nhaehnle April 29, 2025, 5:17pm 4

Thank you so much for this proposal! I’ve been thinking for years now that we should do something along these lines, it just never rose to the point where I could prioritize it.

To add a bit to @arsenm’s point, these intrinsics really are “like branches”. Their lowering involves actual branch instructions and separate basic blocks in MIR today already. The lowering of control flow in AMDGPU is quite complex due to the transform from a thread-level CFG to a wave-level CFG, and it would be helpful to be able to be closer to the MIR representation in a number of ways already in LLVM IR.

arsenm April 30, 2025, 9:39am 5

Also, the current handling breaks an assumption we make in various intrinsic combines. We want to guarantee that operations that manipulate exec only occur at the end of the block, otherwise we need to scan through the block whenever we do anything with one of them

ro-i May 6, 2025, 1:23pm 6

Thanks for the discussion! Are there any further questions, comments or concerns?