[RFC][llvm] Proposing llvm.loop.vectorize.reassociation.enable metadata (original) (raw)

Hello,

I would like to propose llvm.loop.vectorize.reassociation.enable loop metadata that allows unsafe reassociations of computations during the loop vectorization: [RFC][llvm] Added llvm.loop.vectorize.reassociation.enable metadata. by vzakhari · Pull Request #141685 · llvm/llvm-project · GitHub

At least one HPC compiler vectorizes loops with floating point reductions without enabling reassociation broadly. Reassociating FP computations, expectedly, introduces computational errors, which is not always acceptable in HPC applications. At the same time, performance of the generated code matters a lot, and vectorizing the loops with FP reductions is critical for some users. Some users are willing to account for computational errors introduced by reordered reduction computations, but they do not expect FP reassociations to affect non performance critical parts of the code.

The proposed metadata allows to narrow the scope of FP reassociations only to vectorizeable loops. The metadata does not force vectorization itself, so a FrontEnd may set it on all loops in the compilation module given a buy in from the user (e.g. via a command line option).

I am planning to allow Flang to generate this metadata under a user-visible option.

Please let me know if this sounds reasonable or if there are any concerns.

Thanks,
Slava

fhahn May 28, 2025, 3:56pm 2

Do we need a new kind of metadata? Could the frontend just add the reassoc fast-math flag to all FP instructions in the loop?

What happens if the loop doesnt end up vectorizing? The motivation here seems to be particular to reductions, but you wouldnt want other scalar fp opts to apply would you? Same goes for the remainder/residual loop if any.

So you are saying we want to allow reassociation only if we vectorize but not otherwise? If that’s the case, what is the rational? How would the user even tell and why would they care? If I got it wrong, maybe elaborate a bit on the general rational and differences you see.

This sounds like the users want to enable reassociation on a part of the code. AFAIR, Clang has special pragmas for that. Something like below. IMO, this is much better/universal solution. Though, I’m not sure if that’s possible in Flang…
#pragma float_control(push)
#pragma float_control(precise, off)
…
#pragma float_control(pop)

The way I see it is the request is a way to enable FP reductions only. Perhaps the name of the metadata is too broad? Can it just be llvm.loop.vectorize.fpreductions.enable?

I don’t like the idea of any FP reassociation can apply before/after LV regardless of whether or not the loop vectorized – when the intent is simply to enable reductions.

I’ve seen users report that codes not validate because the final reduction is done as a tree and not sequentially, despite all of the reassociation that happened during the vector loop itself. Is this “correct”/“rational” no, but reductions are a common type of loop and it seems like there should be a way to enable them to vectorize without needing to check that the ops have reassoc on them.

(my reply below may duplicate some of the comments of other people, because I started writing it 4 hours ago and then got distracted)

That is an option, though, setting reassoc on FP operations inside loops has a different scope of the effect. For example, it will allow reassociations in non-vectorizable loops and will also allow reassociation beyond the reduction computations.

Another difference is that reassoc will only be applied to FP operations that are seen inside loops before optimizations such as LLVM inlining. For example:

test1.c:

double reduce_add(double s, double x) {
  return s + x;
}

test2.c:

#include <stdio.h>
#include <string.h>
double reduce_add(double, double);

int main() {
  double x[100], s = 0.0;
  memset(x, 0, 100 * sizeof(double));
  for (int i = 0; i < 100; ++i) {
    s = reduce_add(s, x[i]);
  }
  printf("%e\n", s);
  return 0;
}

clang -O3 -flto test1.c -c
clang -O3 -flto -ffast-math test2.c -c
clang -O3 -flto test1.o test2.o -fuse-ld=lld -Wl,-mllvm -Wl,-print-after-all 2>log

Even though reduce_add call inside the loop has fast flags, after the inlining there are no fast flags on the fadd coming from reduce_add.

I know, the test is quite artificial, but the loop metadata seems to convey the actual intention better than FMFs. And the actual intention is to allow vectorizing FP reductions without enabling reassociation in a broader context.

I agree with Scott that the name of the metadata might not be the best.

Thanks for the hint, Sergey!

float_control maps to fast FMF, and they have different (more broad) effect comparing to the proposed loop metadata (see comments from Scott and myself below).

The intention of the loop metadata is to enable vectorizing FP reductions without allowing other FP reasociations. I am not aiming to introduce a source level control for this metadata yet, and the intention is to apply a compiler option that enables FP reductions vectorization per compilation unit.