[RFC] Introducing elementwise clz/ctz builtins (original) (raw)
I’m proposing that we add elementwise versions of clz/ctz builtins. These will be useful, e.g., for implementing the corresponding OpenCL builtins in libclc, where we can get optimal vector IR code generation.
I’ve done most of the work in PR #131995 but have taken it to a RFC to get consensus on the naming.
For reference, these builtins will map directly onto the LLVM ctlz/cttz intrinsics. I’ve modelled them on __builtin_clzg & __builtin_ctzg (GCC docs) with an optional second argument which determines the result if (any element of) the first argument is 0. Without the second argument, a 0 input returns an undefined value.
I can see at least three methods to name these builtins:
__builtin_elementwise_clz
and__builtin_elementwise_ctz
- This is the shortest option (why a whopping one character) and mirrors the GCC builtins. The upside is it’s perhaps most recognisable to developers?
- The downside is the baggage and expectations that come with those builtins, which these new builtins don’t adhere to. So, is this misleading?
__builtin_elementwise_clzg
and__builtin_elementwise_ctzg
- This better mirrors the non-elementwise builtins on which they’ve been modelled, with respect to the behaviour surrounding the second argument.
- But do we really need a
g
if there’s no non-g
elementwise versions?
__builtin_elementwise_ctlz
and__builtin_elementwise_cttz
- This breaks from tradition. It has the advantage that is quite plainly spells out what the builtins are there to do: to generate these specific LLVM intrinsics in a target-agnostic way. These will be clang-specific builtins so we don’t need to worry much about GCC behaviour or conventions.
- One argument against this scheme is that other elementwise builtins don’t correspond to LLVM intrinsic naming, such as
popcount
→llvm.ctpop
.
I don’t really have a strong opinion on this matter either way, but want to get the naming right first time.
jhuber6 April 15, 2025, 4:43pm 2
My understanding is that ctzg
is supposed to be type generic anyway, so we could presumably make it work for vector types as well. However, it changes the output to a vector so that might be out of scope.
Yes, that’s true - I didn’t include that option. As you say it’s a bit more fiddly to change an existing builtin. It would increase discrepancies between clang’s and GCC’s behaviour of the same builtin (whether or not that’s a concern I don’t know). Plus we already have analogous “elementwise” builtins so I feel they fit better there as it creates a fuller set.
I wonder also whether or not people could interpret that __builtin_ctzg
on a vector would do a reduction-like sum of counts over all elements. At least with “elementwise” it’s unambiguous.