RFC: Deprecate -Ofast (original) (raw)
April 30, 2024, 10:28pm 1
Currently, -Ofast means effectively -ffast-math -O3. I think we should “soft-deprecate” the option: emit no deprecation diagnostic, but have it simply become an alias for -O3. An alternative would be to fully deprecate, and potentially later remove -Ofast – but IMO just making it an alias will be less impactful, and achieve the goal.
Why should we do this? In my experience, people don’t expect nor understand that -Ofast means “non-conformant math” – on its face it appears to be an optimization option. It’s weird for an optimization option to have such a side-effect. Additionally, I see no valid justification for having this “optimization level”: if you desired “fast-math”, then simply use -O3 -ffast-math.
It seems to have been initially added to GCC only in 2010, with the intent that it contain options which are prohibited by the standard, but still compile SPEC benchmark programs correctly. I don’t know if the SPEC spec was different back then, but from my reading of the rules today, there is no magical legitimacy given only to options starting with -O. That is: if building with -Ofast is legitimate, then building with -O3 -ffast-math is just as legitimate. So, I’d say that for anyone who cares specifically about SPEC benchmark numbers, they may perfectly well run the benchmark with both options, and we can eliminate this particular footgun for all other users.
Clang: consensus called in this message
Clang: updated consensus called in this message
I think this is less of a Clang-CFE RFC, and more of one for the whole project including LLVM, but I agree with your assessment. -ffast-math is a complete footgun, and having -Ofast include it only proliferates its use without proper understanding of the effects of it.
I think if we documented this ‘deprecate-the-meaning-not-the-flag’ (that is, your ‘soft deprecate’) we should be able to do this easily enough with sufficient release notes/alerts to downstreams.
MaskRay May 1, 2024, 12:47am 3
Removing the -ffast-math effect from -Ofast looks good to me.
In GCC, -Ofast also implies the following two options per gcc/opts.cc default_options_table
- -fallow-store-data-races Optimize Options (Using the GNU Compiler Collection (GCC)) .
- -fno-semantic-interposition
What shall we do with them? I am happy with aliasing -Ofast to -O3 and forgetting about these differences.
(
Some GCC users specify -fno-semantic-interposition for -fpic/-fPIC to enable IPO on default visibility external linkage definitions. There is much less effect on Clang. In Clang cc1, there are three modes:
- -fsemantic-interposition: this represents -fpic -fsemantic-interposition. Don’t set dso_local on default visibility external linkage definitions. Emit a module flag metadata SemanticInterposition to disallow interprocedural optimizations.
- -fhalf-no-semantic-interposition: this represents -fpic without a semantic interposition option. Don’t set dso_local on default visibility external linkage definitions. However, interprocedural optimizations on such definitions are allowed.
- (default): this represents either of -fno-pic, -fpie, and -fpic -fno-semantic-interposition. Set dso_local on default visibility external linkage definitions. Interprocedural optimizations on such definitions are allowed.
-fno-semantic-interposition
)
I’m in favour, the behaviour of Ofast definitely catches people out, and can have surprising non-local effects:
https://trofi.github.io/posts/302-Ofast-and-ffast-math-non-local-effects.html
In this case libsodium thought it was best to build with -Ofast (it doesn’t use any floating point, so where’s the harm?), but this had the effect of subtly breaking downstream dynamically linked software by changing all floating point handling via crtfastmath’s startup code.
I would also be in favor. It effectively matches what our legacy code generator does. We’ve always had a separate option to toggle safe/unsafe math and not combine that with the optimization level.
Deprecating an option because of user error feels kind of ridiculous.
Why are you using -Ofast if you don’t know what it does? I will add that it might help to actually document the option in some meaningful way. Clang command line argument reference — Clang 19.0.0git documentation
Note that gcc properly documents it Optimize Options (Using the GNU Compiler Collection (GCC)) and maybe that’s why there’s not an RFC to remove it there?
As for “It’s weird for an optimization option to have such a side-effect” – well that’s pretty subjective. Does that make GCC weird then? HPC community uses -Ofast on plenty of real world applications that aren’t SPEC.
I can’t downvote this enough!
GNU is the defacto Standard wrt -Ofast. To borrow my colleague’s words, “Deviating from GCC must have a huge justification”.
Countless apps and their makefiles depend on -Ofast’s current behavior. Not only will those need to be updated, but there will be confusion when benchmarks mysteriously slow down comparing “clang -Ofast” versus “gcc -Ofast”.
It also makes benchmark reporting, like SPEC’s, needlessly confusing. Anyone not following along will immediately question why GNU is using “-Ofast” where Clang is using “-O3 -ffast-math”. That’s assuming the user knows that Clang’s -Ofast is really mapped to -O3. Otherwise, Clang will just look slower than GNU with the same options.
GCC’s -fallow-store-data-races is another “yikes”! I don’t see how that option is even valid to use for SPEC benchmarks, never mind a good idea to use in regular code. It seems to me to clearly violate SPEC’s rules against options which cause non-conformance for the benefit of performance other than certain exceptions such as “IEEE-754 is not required” and “Floating point reordering allowed”. I note 97309 – Improve documentation of -fallow-store-data-races also mentions that concern, though I don’t see how the doc clarification resolves that concern…
In any case, we don’t currently implement that option, and IMO shouldn’t.
“Match gcc” is certainly a very strong argument even if gcc is user-unfriendly.
As for “not understanding that -Ofast also relaxes floating ordering”, well, I’ll admit to falling into that hole. My brain is used to my legacy code generator which always has separated optimizations from floating re-ordering rules.
Our default optimization (think of it as -O3) is: “Run all optimization transformations that can make the program execute faster and won’t make things go noticeably slower”. We have one level higher (think of it as -Ofast) is: “Run additional optimizations that might really help, but there might be cases where it makes it much worse. You should do some testing/timing on the image”. That’s where I made the mistake of equating gcc/clang’s -Ofast with my mental definition of -Ofast.
Would -Ofast documentation have helped you?
I think rscottmanley is correct that Clang not documenting -Ofast is the source of this RFC. It literally just says “-Ofast” in the docs and that’s it.
GNU is much more clear about it:
" -Ofast
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition."
If documenting -Ofast saves $MILLIONS from not updating makefiles around the world, then let’s try the documentation change first.
@jyknight I suggest that the Clang community contacts the GNU community to develop this change together. This issue has too large an impact to be done without many eyes on the potential fallout.
Not just a user error, but (in my experience) pervasive user error. Users use it because it sounds like the right optimization level to use for peak performance optimization. I do not believe this is uniquely Clang users. IMO GCC should also drop this option; I proposed it for Clang because I’m not really involved in GCC development.
Yes, better docs (and reading them) would help. The GNU docs are ok with me. It tells me that -Ofast is really -Ofast-with-possibly-unsafe-transformations 
Endill May 1, 2024, 6:42pm 14
I think we definitely need to change the status quo here. I’d expect many users to be surprised by the fact that -std=c++17 -pedantic -Ofast is actually not a conforming mode, despite no arguments suggesting otherwise.
Given the opposition expressed in this thread, the path forward might be a warning that -Ofast enables non-conforming mode, despite -std=c++NN or -pedantic/-pedantic-errors supplied in the arguments.
If I have to pick a side, I support the RFC. Matching GCC should be an argument for us, but not necessarily a design principle.
GCC is in a different position here, because they have been documenting the behavior for years. We, on the other hand, are not bound by such a public contract for this flag. This wouldn’t be first time Clang deploys less conservative changes than GCC. They even might follow, like (if I remember correctly) in the case of issuing diagnostics for ancient C features.
In this case libsodium thought it was best to build with -Ofast (it
doesn’t use any floating point, so where’s the harm?), but this had
the effect of subtly breaking downstream dynamically linked software
by changing all floating point handling via crtfastmath’s startup code.
As of earlier this week, Clang on trunk no longer links in crtfastmath.o
in shared libraries unless explicitly requested, and gcc has not been
doing so for about a year.
MaskRay May 1, 2024, 6:54pm 16
Thanks for the points.
One nuance I’d like to add is that I believe people are more concerned with an umbrella option adding -ffast-math than the opposite. A warning about -Ofast might trigger more complaints than -Ofast dropping -ffast-math.
Therefore, I’m leaning slightly towards @jyknight’s suggestion of aliasing without a warning. However, I’m open to the possibility of including a warning as well.
The GCC doc about -fallow-store-data-races is still unclear. I agree we should not add this non-conforming behavior.
Endill May 1, 2024, 7:42pm 17
I’m uncomfortable that we’re going to silently change the behavior of a user-facing option in a significant way. I’d like to see a transition path. On top of making the change less abrupt, it should be natural to introduce some visibility for users there. In this case visibility corresponds to user education, including the implications of GCC option. I think this would benefit the whole C++ community out there.
To be clear, transition path I’m speaking of can be quite fast: in 19 we make -Ofast an alias for -O3, issuing a diagnostic that this option is now conforming, and that -ffast-math has to be enabled manually; then in 20 the diagnostic is disabled or removed entirely.
FCLC May 1, 2024, 8:28pm 18
The HPC community does, but (AFAIK and have been told) it was adopted as an easier way of enabling vectorization when the auto-vec tooling wasn’t quite as strong; The advice I see in most HPC/RSE circles these days is to stick with either -O2 or -O3 while also supplying other flags such as your cluster specific -march/mtune pairs, and on a case by case basis, certain sub options of -ffunsafe (be it finite math, certain reduction operations, etc.)
I regularly find myself removing -Ofast as a cause of bugs by HPC users/researchers who simply didn’t know any better.
Would documentation have helped? Probably. But I see the same mistakes/footguns happen regularly from GCC that does have documentation on how -Ofast breaks conformance (not only of the language standard, but also related standards like IEEE754)
I’m concerned about significantly changing the meaning of existing flags. -Ofast has always implied -ffast-math, on both gcc and clang. And changing it will break user code, in the sense that vectorized loops will have very different performance characteristics.
If we think the name -Ofast is so treacherous that we don’t want to expose the semantics under that name, we should deprecate/remove it. Otherwise, we should leave it.
Not just performance but potentially significant numerical differences for reductions. It’s not just vectorization either. Silently turning off something as broad as fp reassocation for a top level option is a wild consideration.
I agree that it should either be removed completely or be left alone. And I don’t think it should be removed as long as gcc has -Ofast. I also don’t think gcc should remove it, but that’s another forum.