[RFC] Improving IR fast-math semantics (original) (raw)

This is a proposal to ~~fix~~ make less broken the specification of fast-math flags in LLVM IR and to better rationalize what optimizations are allowed to do floating-point arithmetic with and without fast-math flags or other attributes. Given the length of this RFC, I have opted to provide, at the end, a summary of the questions whose answers I am particularly interested in reaching consensus on.

Regular FP semantics

The semantics of floating-point in LLVM IR are generally underdefined, and I’m not going to attempt to fix all the issues now. However, it is necessary to understand what the baseline semantics are before saying how fast-math flags affect these semantics. Nothing I say in this section is intended to be a change to existing semantics, merely a summary of my understanding of semantics.

In the absence of any other annotations, floating-point in LLVM IR is largely intended to match the semantics of a C compiler that has FENV_ACCESS disabled, FE_SNANS_ALWAYS_SIGNAL undefined, and FLT_EVAL_METHOD set to 0 [1]. Furthermore, it is intended that the semantics of the major floating-point types comport to IEEE 754 semantics, albeit with the rounding mode assumed to be FE_TONEAREST, the FP exception word always being in an unspecified state, and NaN payload handling rules that are out of scope of this RFC.

What this essentially means is that if we have something like an fdiv instruction, and we constrain values to be in the set {any valid finite FP value, ±∞, one NaN value}, the optimizer must exactly preserve the resulting value in that set. It is not legal to do a transformation which could change the numeric result: a / b is not the same as a * (1.0 / b). We also assume that we can introduce or remove arbitrary FP operations at our pleasure, without regard for the environment.

A final note is that for the various intrinsics (and math library functions, though there does seem to be an emerging consensus of moving from libm functions to intrinsics), we should assume that any intrinsic corresponding to an operation in IEEE 754 §9.2 (or equivalently the table of C2x §F.3¶20) has unknown accuracy (and thus shouldn’t be constant-folded) such as sin or pow, while the other intrinsics (such as sqrt or rint) should be regarded as always having the correctly-rounded result (for the default rounding mode of course).

[1] This means our behavior on i386 when using x87 and not SSE is buggy, but given the difficulty of the fix, this is unlikely to be fixed anytime soon.

Fast-math semantics

The user goal of specifying fast-math flags is to override the above semantics of floating-point, trading numerical consistency for speed. Clang is somewhat unusual among C compilers in allowing these flags to be specified on a per-block basis, although the C standard’s pragmas for affecting floating-point computation permit a similar per-block granularity, and there is a TS that extends the minimal C pragma set to include many more pragmas for optimization control. As a result of optimization, it is possible for floating-point expressions to contain operations with different sets of flags enabled, and this calls for greater specificity in our semantics than we have historically done.

Currently, we have 7 flags, with an 8th flag (fast) being the conjunction of all the other 7 flags. That fast, which is set by the most general -ffast-math option (and therefore indicates a desire to perform any optimization one might consider acceptable in such a mode), is specifically defined as only the conjunction of the other flags means that we have to shoehorn every optimization into one of those 7 flags. And there are optimizations which don’t fit in well for any of the existing flags, which has led to the undesirable situation of either pushing one of the flags well beyond reasonable user expectations or checking specifically for the fast flag (both approaches are used today).

I believe the flags fall into one of two categories. The nnan, ninf, and nsz flags all fall into what I call Value FMF. For nnan and ninf, they remove NaNs and infinities from the set of valid values and replace them instead with poison. nsz has somewhat more complicated semantics, but can be interpreted along the lines of removing both -0.0 and +0.0 from the set of values and replacing instead with a single 0.0 of indeterminate sign.

The second category is what I call Rewrite FMF and comprises contract, reassoc, arcp, and afn. For these flags, the semantics cannot be described purely by defining allowable values, and instead they are meant to describe allowable transformations [2]. For example, contract enables the rewrite of (a * b + c) into fma(a, b, c), while arcp enables the rewrite of a / b into a * (1.0 / b). The current definitions have severe gaps, both in explaining which rewrites are considered permissible for these flags as well as in defining which flags are necessary in more complex expression trees involving multiple operations.

There is one more issue worth mentioning here. The way fast-math flags are currently being represented in LLVM instructions, there is room for exactly 7 bits to store the flags. All 7 bits are currently being used for flags; adding a new flag requires adding a new bit, and I do not see any easy place where a new bit can be scrounged for fast-math flags.

[2] I am not all that experienced in formal semantics. Nevertheless, I do find operational semantics much easier to reason about, and so the idea of rewriting-based semantics is somewhat terrifying to me. In my attempts to find academic work on fast-math transformations, the only relevant example I can seem to find is Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler, which similarly uses rewriting-based semantics. I would greatly appreciate feedback from those more knowledgeable about formal semantics.

Transformations without flags

Some transformations are legal in the absence of fast-math flags, even in partial violation of IEEE 754. The most notable such transformations are identity transformations, that convert expressions like x * 1.0 to x. These are justifiable in our semantics by our treatment of sNaNs as largely synonymous with qNaN. Additionally, the commutative law for commutative operations (such as addition and multiplication) can be freely applied [3]. See C2x§F.9.2 for some more examples of allowable and disallowable transformations.

An additional concern is what assumptions we can make about math library intrinsics. For example, are we allowed to assume that -sin(x) is equivalent to sin(-x) without any fast-math flags?

[3] Friendly reminder: IEEE 754 makes FP addition and multiplication commutative (i.e., x + y == y + x), because the effect of operand order on which NaN payload is returned is not considered an observable effect in IEEE 754. It is the associative property (i.e., (x + y) + z != x + (y + z)) which is not generally held by FP. Please keep your terminology straight!

Fixing value fast-math flags

It has been noted in the past that the existing definition of nnan and ninf can be unexpectedly strong for users—poison is a potent tool for exploiting undefined behavior. Despite this, their existing semantics are clear and I do not propose to change them now.

The nsz flag is more interesting. The current definition is “Allow optimizations to treat the sign of a zero argument or zero result as insignificant”—implying nondeterministic choice. But the available nondeterministic choice is unclear. Is it:

If an {argument, result} would be -0.0 per IEEE 754, replace it with nondeterministic choice of {+0.0, -0.0}
If an {argument, result} would be +0.0 or -0.0 per IEEE 754, replace it with nondeterministic choice of {+0.0, -0.0}

The intent of nsz is clearly to enable optimizations that won’t correctly preserve the sign of zeroes, and one such optimization is converting x < 0 ? -x : x to fabs(x): in the former expression, the result is -0.0 if x is -0.0, whereas it is 0.0 if x is -0.0. However, this does bring up a hole in the current definition: the key case we care about in this transformation is the false case, where there are no operations done to x other than the select, so we can only do the transformation when the select itself has the nsz flag. But select, and its cousin phi, are data-movement instructions that are as likely as not to come from some transformation like SROA, which lacks information as to what fast-math flags it should set on the new instructions, as opposed to the frontend which knows what fast-math flags to apply to all instructions in a block. In other words, as it stands, nsz is too weak to really enable the optimizations it wants to enable.

However, I don’t have a good solution to the problem here. The flag is essentially asking to be a different type—a nsz float is one that has no semantic difference between +0.0 and -0.0, unlike a regular float—but at the same time, it’s clear that it doesn’t meet the bar for a new type in LLVM IR. The current paths forward that seem contemplatable are to allow these flags to be provided on any operation that produces a float value (including things like load instructions), or to rely on function-level attributes and infer stuff from that, neither of which feel like a great model to rely on.

Fixing rewrite fast-math flags

These flags are the ones whose definitions need the most work, and I specifically intend to give a more thorough and precise definition of these semantics with this RFC.

For the purposes of giving semantics here, I am defining an expression tree as a set of LLVM instructions corresponding to the interior nodes in the computation tree. An expression tree has a given rewrite fast-math flag if and only if that flag is set on all instructions in that expression tree. In other words, the rewrite fast-math flags of a tree is the intersection of all fast-math flags on the operations that comprise the tree [4]. For example:

define void @f(ptr %mem) {
  %a = load double, ptr %mem
  %sqrt = call contract arcp double @llvm.sqrt.f64(double %a)
  %val = fdiv arcp reassoc double 1.0, %sqrt
  store double %val, ptr %mem
}

I can describe an expression tree here as consisting of the sqrt and fdiv instructions, and the fast-math flags on this expression tree is just arcp (the only rewrite flag in common).

When rewriting an expression, all of the nodes in the rewritten expression will have the fast-math flags of the original expression tree (i.e., the intersection).

An expression is mathematically equivalent to a rewritten expression if the two expressions would produce the same result if arithmetic were done with infinite range and precision. This is almost saying that it is true if the numbers were real numbers and not IEEE 754 floating-point numbers, but the expressions must also arrive at the same value in cases for which the mathematical expression is not defined (e.g., 1.0/0.0, which produces +∞). If the value FMF are also present on an instructions within an expression tree, it is not necessary that the rewritten expression produce the same result in the cases the flag would exclude. For example:

define double @g(double %a, double %b) {
  %sqa = call nnan nsz double @llvm.sqrt.f64(double %a)
  %sqb = call nnan nsz double @llvm.sqrt.f64(double %b)
  %sq = fmul double %sqa, %sqb
  ret double %sqa
}

define double @h(double %a, double %b) {
  %mul = fmul double %a, %b
  %sq = call nsz double @llvm.sqrt.f64(double %mul)
  ret double %sqa
}

It is mathematically equivalent to rewrite g into h, as the nnan and nsz on the sqrt calls allow us to infer that a and b are non-negative, and sqrt(a * b) = sqrt(a) * sqrt(b) when a and b are non-negative. But it is not mathematically equivalent to rewrite h into g, as the flags are not enough to include that a and b are nonnegative.

[4] Note that this only applies for rewrite FMF. nnan, ninf, and nsz all continue to apply and impart semantics to individual instructions within an expression tree, and this can be sensibly done because they are defined with value semantics. Indeed, being able to apply different rules to these two categories is part of the reason I define these categories.

`contract`

The contract FMF corresponds to the intent of STDC FP_CONTRACT ON. Unfortunately, the C standard is not especially clear about the intent, and the current LLVM LangRef definition is even worse. My proposal for clarification of intent is as follows:

A FP expression tree with contract can be rewritten into another expression tree that is mathematically equivalent, and where the resulting expression tree is implemented with full precision and no intermediate rounding. In general, the resulting expression tree will be a single operation, but multiple operations are permissible if no intermediate rounding is done (e.g., an fneg operation may be a second operation).

For example, and most importantly, this allows the transformation of (x * y) + z into fma(x, y, z). The transformation of (x * y) - z into fma(x, y, -z) is also legal, even though this creates two operations in the resulting tree.

Another example of an allowable transformation under contract is cr_sin(x)/cr_cos(x) may be transformed into cr_tan(x) [5]. The provision that the operations are correctly rounded is mandatory here; if, as we usually do, we do not know that the intrinsics are correctly rounded, then it is not legal to perform this transformation with the contract flag.

[5] This transformation does make me feel a little uneasy. But this is based on my reading of C2x§6.5¶8 and another example of FP_CONTRACT allowing (float)sqrt(x) to be converted to fsqrt(x), which suggests that <math.h> functions should be considered as contractible “operations”. If people disagree, I do not object to changing the rules here.

`arcp`

The existing Clang command-line flag says “Allow division operations to be reassociated” (which, incidentally, isn’t actually correct); the LLVM language reference says “Allows optimizations to use the reciprocal of an argument rather than perform division” (and the gcc command-line flag uses similar language). In expression form, this specifically permits the expression x / y to be replaced with x * (1.0 / y).

It’s not clear which other rewrites are possible under this flag. However, in the spirit of the flag, I think it is also reasonable to make a / (b / c) convert to a * (c / b). At present, LLVM requires both arcp and reassoc to make this transformation, but I feel that reassoc is not necessary. GCC, with just a arcp, will convert a / (b / c) to (a / b) * c, but I’m not convinced that’s the correct non-reassociating rewrite.

Another note is that I think these rewrite possibilities should be bidirectional. On the face it this may seem to be a bad idea, but allowing the reverse direction would allow arcp reassoc to permit reassociation of division (i.e., a * (b / c) <=> (a * b) / c) without the need for any additional rule to combine flags.

In short, the proposal I have for arcp is as follows:

A FP expression tree with arcp allows the following rewrites to be performed, in both directions:

x / y ↔ x * (1.0 / y)

a / (b / c) ↔ a * (c / b)

`reassoc`

The purpose of this flag is to permit the application of the associativity and distributivity laws to FP math. That it permits distributivity is not clear from the current definition, which only talks about “reassociation”, but it is performed by InstCombine already, and is generally agreed to be part of the definition.

Existing optimizations have, in my opinion, abused the reassoc flag to handle optimizations such as pow(x, y)/x => pow(x, y - 1), or sin(x)/cos(x) → tan(x), in essence arguing that it’s applying to the definition of the underlying elementary functions. While the former case does have some logic to it to me, I am hesitant to endorse optimizations on the not-correctly-rounded-libm functions using only this flag. There is a further complication when you introduce correctly-rounded functions like ldexp or sqrt.

Is ldexp(ldexp(x, a), b) → ldexp(x, a + b) valid with reassoc? (Note that it is not always valid, since one of the ldexps may produce a denormal number, creating inaccuracy, and there are other assumptions about overflow that may apply).

Or how about sqrt(a) * sqrt(b) → sqrt(a * b) if a and b are both non-negative (or the nnan and nsz flags both present)?

In my own opinion, reassoc should be limited to fewer operations. The core operations of addition, multiplication, and fma are clearly permitted. Division and subtraction (when converted to multiplication of reciprocal and addition of negation, respectively) are additionally permitted when other flags or knowledge permits the necessary conversions. I have not come to a firm decision for ldexp or sqrt: ldexp is an exact-but-for-underflow operation, which does feel like it should apply in similar cases as to addition or multiplication associability. Meanwhile, I’m more uncomfortable about sqrt, but it is a core IEEE 754 operation, in the same section as the 4 basic math operations, fma, and integer conversions, and it is correctly-rounded.

`afn`

The existing definition for afn is “Allow substitution of approximate calculations for functions (sin, log, sqrt, etc)”. As a semantic definition, this absolutely terrifies me: how approximate is approximate? Is the optimizer permitted to approximate sin(x) → x, the famous small sine approximation in physics?

I’ve identified the following optimizations enabled by the flag in the current optimizer code:

AMDGPU converts fdiv x, y to fmul x, (rcp y) instructions
AMDGPU will convert some libm functions to intrinsics
PPC will convert, e.g., acos to __x1_acos (added by ⚙ D101759 [PowerPC] Scalar IBM MASS library conversion pass)
pow(x, y) → exp2(log2(x) * y) if x is positive finite non-zero
pow(x, 0.5) → sqrt(x) when either afn or reassoc is present
pow(x, n) → powi(x, n) if n is an integer (also powi(x, n) * sqrt(x) if n is integer + 0.5)
1.0 / sqrt(b) → rsqrt(b) when both afn and contract are present

Overall, I believe I see that this flag has two slightly different definitions. In the first definition, it permits a function to be replaced with a lower-accuracy version (mostly in line with the stated definition in the LangRef). This mostly comes out from targets that have partial-accuracy instructions that could be selected. The other definition is largely converting pow to hopefully-cheaper functions that might do the same thing in certain special cases, which seems to be somewhat more generalizable into allowing transformations on libm functions that would be mathematically correct if floats were real numbers.

My belief is that a vague use-lower-accuracy definition is a poor definition for IR. I would prefer to see a proposal like Andy’s precision attributes to be able to choose between different implementations on the performance-accuracy optimization frontier. If such a feature is added, then we don’t need this flag anymore, allowing its bit to be reclaimed.

`bikeshed`

As alluded to in discussions of reassoc and afn, there is a need for another flag which does not currently exist, one that would allow generic mathematical rewrites that are not otherwise covered by other flags. I do not have a good name for this flag, so I title this section bikeshed instead. I would map this flag to the -funsafe-math-optimizations command-line option (which currently maps instead to fast).

Unlike contract, this new flag would permit the rewrite of an expression tree into another expression tree with more operations. It would also permit the optimization of non-correctly-rounded libm functions (such as sin or pow).

As for interactions with other flags, I cannot come up with any good semantics definition of bikeshed that would not make bikeshed imply the other rewrite flags. Rather than attempting to do so, I would simply make it a requirement that any instruction with this flag include all other rewrite flags. It is, however, possible to come up with semantics for bikeshed with and without the value flags: bikeshed nnan ninf nsz would require that the two expressions be equivalent for all real numbers, while bikeshed nnan nsz would require they be equivalent for all extended real numbers. Similarly, bikeshed without nsz would require that -0.0 and +0.0 work appropriately in both expressions, while bikeshed without nnan requires that the expressions be equivalent should NaNs be involved. This can even be extended to expression trees which do not consistently have flags, if the analysis is careful enough.

Questions for discussion

If we need more FMF, where do we get the bits from?
Are rewrite-based semantics for FMF acceptable? Are they avoidable? Can we develop something like Alive2 to validate FMF-fueled optimizations with rewrite-based semantics?
What assumptions can we make about libm intrinsics/libcalls?
- Do even/odd properties apply?
- Can we assume special cases?
- Can we assume exact values that are not special cases?
- Would correctly-rounded versions change any of these answers?
- Which fast-math flags change these answers?
What operations other than fma(-like) are contractable with contract?
- Specifically, does pow(x, 0.5) → sqrt(x) constitute a valid contraction?
Should arcp without reassoc permit a / (b / c) → a * (c / b)?
Which operations does reassoc apply to beyond addition/multiplication? sqrt? ldexp? pow?
What flag should be necessary to convert sqrt(fpext x to double) → fpext sqrtf(x) to double?
What flag should be used to enable generic algebraic transforms along the lines of gcc’s -funsafe-math-optimizations, e.g., sin(x)/cos(x) → tan(x) or log(exp(x)) => x? If a new flag, what should its name be?
Should afn be retired in favor of better precision control for functions?
What should be done to handle the nsz semantics weakness?
Should expression rewrites that might introduce overflows (such as sqrt(a) * sqrt(b) → sqrt(a * b)) be permissible?