[RFC] Fast-math flags semantics: contract (original) (raw)

February 6, 2025, 10:04pm 1

Continuing my previous RFCs on fast-math flags, I want to continue the discussion, this time focusing on a single flag in particular: contract.

Defining contraction

The goal of the contract fast-math flag is to enable contractions, or (as the term is defined in the C specification):

A floating expression may be contracted, that is, evaluated as though it were a single operation, thereby omitting rounding errors implied by the source code and the expression evaluation method.
[Footnote: The intermediate operations in the contracted expression are evaluated as if to infinite range and precision, while the final operation is rounded to the format determined by the expression evaluation method. A contracted expression might also omit the raising of floating-point exceptions.]

This is not the clearest description of what can and can’t be contracted, for it gives little guidance on what constitutes an operation, and its relevance to operations whose accuracy is not well-constrained (such as the libm intrinsics) is unclear. I am not aware of any other language specification that has attempted to give semantics to a contract-like flag in as much detail as C.

The single most common contracted expression is of course FMA operations, and most discussion of contract tends to treat it as synonymous with trying to form FMAs. The other example that comes up in the history of STDC FP_CONTRACT is the conversion of (float)sqrt(x) to sqrtf(x) where x is a float (i.e., contract fpext + sqrt.f64 + fptrunc to a sqrt.f32). Note that sqrt is a core IEEE 754 operation, and Annex F (governing what C considers full IEEE 754 compliance) does not permit it to not be correctly-rounded.

For the wording of the contract flag, I propose to clarify its semantics by essentially adopting the C definition, with some extra clarification as to what I mentioned above:

A sequence of LLVM instructions that all have the contract FMF may be combined into a new sequence of instructions that:

Computes the same value if both were evaluated at infinite range and precision. Note that some rewrites may only be legal if the compiler can infer bounds on the input values, some of which may come from value FMF like nnan and ninf.
Contains at most one instruction that causes rounding (e.g., you can turn a * b - c into fma(a, b, -c), because fneg does not do any rounding).
Only contains an incorrectly-rounded function if the old sequence has a to-be-added FMF that indicates that it is legal to optimize the appropriate functions. Incorrectly-rounded here means a C library function, or corresponding LLVM intrinsic, listed in C23 Annex F.3p20 (and if anyone has a better name for these functions please share it).
Does not reassociate any of the instructions, unless the original sequence also had the reassoc flag. Note that since contractions can only rarely produce multiple instructions, this requirement is largely superfluous.

Some concrete examples of permissible transformations (and examples will be explicitly laid out in the LangRef when I start working on a patch for these changes):

a * b + c → fma(a, b, c)
a * b - c → fma(a, b, -c) [because fneg doesn’t round]
fpext; sqrt.f80; fptrunc → sqrt.f64 [note that sqrt.f64 → sqrt.f32 doesn’t require contract, because the transformation can be done without double rounding, see this paper for a proof]
x / sqrt(x) → sqrt(x), but only if x is known to be non-zero and non-infinite. [Note that the results differ for x = +/-0.0 and x = +infinity, where the former produces NaN and the latter is just the input argument.]
pow(x, 0.5) → sqrt(x) [Although pow is an incorrectly-rounded function, its replacement is correctly-rounded, so its replacement is legal]
ldexp(ldexp(x, a), b) → ldexp(x, a + b), only if a + b is known to not overflow
sin(x) / cos(x) → tan(x), but only if they have optimize_libm flag enabled

Interactions with non-fast-math optimizations

This was prompted by the discovery of an existing DAGCombine pattern:
-(x * y) - z → fma(-x, y, -z), where the intermediate fneg wasn’t checked for a contract flag, but the outer fsub and fmul are. At first glance, this transformation is incorrect: all the nodes need to have the contract flag for a rewrite to be valid, per previous consensus. But fneg can be freely commuted with multiplication and division, so that -(x * y) - z becomes (-x * y) - z, so if this reordering does not affect the fast-math flags of the fmul, then the resulting transformation is legal.

In short, is (fneg (fmul flags x, y)) → (fmul flags (fneg x), y) legal?

The question is easy to answer for value-based fast-math flags, since their formal semantics are easy. Alive2 tool already supports them, and confirms the legality here for those flags: Compiler Explorer

But for the rewrite-based flags, that analysis doesn’t work. We do have a rule that fast-math rewrites need to place the intersection of the rewrite-based flags on the resulting instruction, but it’s not immediately clear to me that this is or should be the case for rewrites that are fully applicable even to precise semantics under IEEE 754 (such as -(x * y) → -x * y).

Consider also an optimization that deletes an instruction as a nop. If you have (fadd fast (fmul (fmul fast x, y), 1.0), z), the presence of a non-fast fmul in the middle would normally act as a barrier to any rewrites, since it’s lacking any fast-math flags. However, it is also legal to delete the instruction, which means we now have a (fadd fast (fmul fast x, y), z), which allows contraction to an FMA now. If we want to say that the non-fast fmul is a hard optimization barrier, that means any optimization that deletes a non-fast floating-point instruction must make sure that it doesn’t accidentally create new opportunities for fast-math rewrites. But I don’t think this is tenable, for this could also impact things such as deleting trivial select and phi instructions; the blast radius in the compiler for this concern is too wide.

We do have an intrinsic, llvm.arithmetic.fence, which effectively acts as a barrier for fast-math optimizations. However, this intrinsic can be difficult to use for frontends that want to insert optimization barriers on a transition from a non-contractable to contractable region or vice-versa (e.g., transitioning from #pragma STDC FP_CONTRACT ON to #pragma STDC FP_CONTRACT OFF), since it would require wrapping every live-in/out variable with this intrinsic.

The clarifications here look good to me.

incorrectly-rounded function

This name is probably OK. not-required-to-be-correctly-rounded function would be more precise but is too wordy.

nikic March 16, 2025, 5:25pm 3

Unlike all the other FMF flags, Clang enables contract by default, so I think we need to be exceedingly careful about extending its scope beyond the primary FMA use-case.

Do I understand correctly that under this proposal, it would be legal to lift a + b + c on half to fptrunc(fpext(a)+fpext(b)+fpext(c)) on float by “contracting” away the intermediate rounding step in fptrunc(fpext(fptrunc(fpext(a)+fpext(b)))+fpext(c))?

Or, maybe more simply put, can we eliminate fpext(fptrunc(x)) pairs, as these operations are no-ops in infinite precision? (For the sake of argument, let’s assume the argument is known non-NaN.)

arsenm March 17, 2025, 12:36am 4

The default is -ffp-contract=on (for most of the languages), but this is not the same as adding contract flags by default. This only means use fmuladd. You need -ffp-contract=fast enabled, or use #pragma clang fp contract to get contract flags

nikic March 17, 2025, 9:43am 5

Thank you, I wasn’t aware of that distinction! That does address my concern.

I think I’d still appreciate some explicit discussion of fpext(fptrunc(x)) elision under the contract flag. One thing I’m wondering is which additional FMF flags (or nofpclass preconditions) it requires.

It should not need ninf, right? The transform can convert an inf into a non-inf, but that is in line with avoiding an intermediate rounding step.

I assume it requires nnan, because even if we pick the “Unchanged NaN propagation” semantics for both operations, the result should have zeros in the low bits of the payload. (Though I could also see an argument that preserving more payload bits under contract is valid.)

I’m curious what the motivation for this requirement is.

arsenm March 17, 2025, 11:09am 6

They are both potentially canonicalizing operations; you have no expectation of payload bit preservation

nikic March 17, 2025, 12:08pm 7

Sure, there’s no expectation of preservation. The question is whether it is legal to preserve the full NaN payload under the contract flag. The NaN semantics using “Unchanged NaN propagation”, combined with the fptrunc semantics and the fpext semantics permit fpext(fptrunc(x)) to preserve the original NaN with the low bits zeroed.

They do not allow preserving the original NaN value fully (including the low bits). I’m basically just trying to confirm that the presence of contract does not change that.

jcranmer March 17, 2025, 9:24pm 8

Contains at most one instruction that causes rounding (e.g., you
can turn |a * b - c| into |fma(a, b, -c)|, because |fneg| does not
do any rounding).

I’m curious what the motivation for this requirement is.

The motivation here is the example given. The original design I had was
to require the contraction to go to a single “instruction” (keeping in
mind that LLVM IR is not 1-1 with actual hardware instructions), but I
didn’t want to overly exclude cases like the x86 VFMSUB instructions
(which compute fma(a, b, -c)). At the same time, I was worried that
including multiple instructions in general would make the contraction
overly aggressive–(I agree that making contract too general makes me
unpleasant). Make it too general, and you can argue for every
fast-math transform to be validated by contract.

Limiting the multiple-instruction to having just one rounding
instruction in them feels like the best compromise. You still get the
ability to throw in the fneg in the middle of the pattern, and you
don’t accidentally allow a lot of other stuff.

Do I understand correctly that under this proposal, it would be legal
to lift |a + b + c| on |half| to |fptrunc(fpext(a)+fpext(b)+fpext(c))|
on |float| by “contracting” away the intermediate rounding step in
|fptrunc(fpext(fptrunc(fpext(a)+fpext(b)))+fpext(c))|?

By the definition I’ve given, yes, if you put the contract flag on the
fpext(fptrunc(x)) pairs. I’m not entirely comfortable with this
conclusion though.

I should note that it’s not clear to me that fadd contract half %a, %b
should be legal to expand to fptrunc contract(fadd contract(fpext contract, fpext contract)). Although that does work against our
existing fast-math flag machinery.