[RFC] Introducing elementwise clz/ctz builtins (original) (raw)

I’m proposing that we add elementwise versions of clz/ctz builtins. These will be useful, e.g., for implementing the corresponding OpenCL builtins in libclc, where we can get optimal vector IR code generation.

I’ve done most of the work in PR #131995 but have taken it to a RFC to get consensus on the naming.

For reference, these builtins will map directly onto the LLVM ctlz/cttz intrinsics. I’ve modelled them on __builtin_clzg & __builtin_ctzg (GCC docs) with an optional second argument which determines the result if (any element of) the first argument is 0. Without the second argument, a 0 input returns an undefined value.

I can see at least three methods to name these builtins:

I don’t really have a strong opinion on this matter either way, but want to get the naming right first time.

jhuber6 April 15, 2025, 4:43pm 2

My understanding is that ctzg is supposed to be type generic anyway, so we could presumably make it work for vector types as well. However, it changes the output to a vector so that might be out of scope.

Yes, that’s true - I didn’t include that option. As you say it’s a bit more fiddly to change an existing builtin. It would increase discrepancies between clang’s and GCC’s behaviour of the same builtin (whether or not that’s a concern I don’t know). Plus we already have analogous “elementwise” builtins so I feel they fit better there as it creates a fuller set.

I wonder also whether or not people could interpret that __builtin_ctzg on a vector would do a reduction-like sum of counts over all elements. At least with “elementwise” it’s unambiguous.