Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison by saethlin · Pull Request #120718 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation38 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags nnan
(no nans) and ninf
(no infs) can produce poison.
And this uses the algebraic float ops to fix #120720
cc @orlp
r? @cuviper
(rustbot has picked a reviewer for you, use r? to override)
rustbot added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
Relevant to the compiler team, which will review and decide on the PR/issue.
labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
label
saethlin changed the title
Set only fast-math flags which don't produce poison Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
I'd like to support this (I mean, I suggested it to @saethlin), it seems that with just the current safe LLVM flags available a lot of optimizations are already possible (and more algebraically justified optimizations could be added in the future). Most notably, autovectorization and FMA conversion both work out of the box. From a quick test of this PR,
fn sum(arr: &[f32]) -> f32 { arr.iter().fold(0.0, |a, b| fadd_algebraic(a, *b)) }
generated excellent autovectorized code, completely safely.
The current f*_fast
intrinsics are essentially unusable in generic code because they are instant UB on infinities/NaNs. It is for any non-trivial algorithm completely infeasible to verify that no infinities/NaNs exist in the input, let alone are produced as (intermediate) results. In a quick search through Github I've found every place that used f*_fast
to be unsound. The safe algebraically optimizeable float operations are desperately needed.
saethlin marked this pull request as ready for review
Relevant to the library team, which will review and decide on the PR/issue.
label
Some changes occurred in compiler/rustc_codegen_cranelift
cc @bjorn3
I think this needs more compiler review than libs...
r? compiler
I'm on vacation.
r? compiler
I->setHasAllowReassoc(true); |
---|
I->setHasAllowContract(true); |
I->setHasAllowReciprocal(true); |
I->setHasNoSignedZeros(true); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about afn
(Approximate functions)? It doesn't poison but isn't mentioned here.
Does the word "algebraic" have a specific meaning here?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I see from other places that it does.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
afn
(Approximate functions)? It doesn't poison but isn't mentioned here.
Sure, why not. I'll add it.
Does the word "algebraic" have a specific meaning here?
I didn't want to just defang the existing intrinsics because there are some optimizations which rely on assuming that NaN/Inf do not occur. So I needed to come up with a new name, and the way I think about these intrinsics is that unlike IEEE float operations, they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c
and a / b = a * (1 / b)
.
I don't think that the name is perfect, and I'd be happy to see someone suggest a better name, but @orlp seems perfectly happy calling them "algebraic".
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they permit the usual algebraic transformations. Things like
a + (b + b) = (a + b) + c
anda / b = a * (1 / b)
That would be a great explanation to put in a comment somewhere :)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.
Sure, why not. I'll add it.
Please don't. Replacing functions by their approximations isn't an algebraically justified optimization.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 And in any case, I don't think it would matter for these intrinsics.
@@ -1882,6 +1882,46 @@ extern "rust-intrinsic" { |
---|
#[rustc_nounwind] |
pub fn frem_fast<T: Copy>(a: T, b: T) -> T; |
/// Float addition that allows optimizations based on algebraic rules. |
/// |
/// This intrinsic does not have a stable counterpart. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which meaning of "stable" does this use?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to call this intrinsic is to use the core_intrinsics
feature. We do not have a wrapper for these like the atomic intrinsics.
@@ -417,8 +417,7 @@ extern "C" LLVMAttributeRef LLVMRustCreateMemoryEffectsAttr(LLVMContextRef C, |
---|
report_fatal_error("bad MemoryEffects."); |
} |
} |
// Enable a fast-math flag |
// Enable all fast-math flags |
// |
// https://llvm.org/docs/LangRef.html#fast-math-flags |
extern "C" void LLVMRustSetFastMath(LLVMValueRef V) { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the fast
intrinsics?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep them for now and let people play with both variants. I hope that based on experience we can make an argument that the unsafety of the unsafe ones is not worth the optimizations that they unlock, but I have no data to back that up.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I am very much the opposite of a floating point expert, but you've been bounced around various reviewers so I'll do my best to review this and allow progress.
In general, adding safer variants of FP intrinsics seems fine, as does converting some existing intrinsics to use them when there are known bugs (#120720) with the less safe variants. I've asked some questions above just to give myself a bit more certainty about this change.
My final questions here are about the exact meaning of "intrinsics". I think these new algebraic intrinsics are internal only? And they are used to implement simd_reduce_{add,mul}_unordered
, which are also internal-only intrinsics? And they are used to implement the externally visible _mm512_reduce_add_pd
intrinsic mentioned in #120720? Though I see no mention of _mm512_reduce_add_pd
anywhere within the rust-lang/rust
repository, so I'm confused by that.
📌 Commit 41fddb5 has been approved by nnethercote
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
saethlin added a commit to saethlin/rust that referenced this pull request
…nethercote
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags nnan
(no nans) and ninf
(no infs) can produce poison.
And this uses the algebraic float ops to fix rust-lang#120720
cc @orlp
bors added a commit to rust-lang-ci/rust that referenced this pull request
Rollup of 8 pull requests
Successful merges:
- rust-lang#120718 (Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison)
- rust-lang#121195 (unstable-book: Separate testing and production sanitizers)
- rust-lang#121205 (Merge
CompilerError::CompilationFailed
andCompilerError::ICE
.) - rust-lang#121233 (Move the extra directives for
Mode::CoverageRun
intoiter_header
) - rust-lang#121256 (Allow AST and HIR visitors to return
ControlFlow
) - rust-lang#121307 (Drive-by
DUMMY_SP
->Span
and fmt changes) - rust-lang#121310 (Remove an old hack for rustdoc)
- rust-lang#121311 (Make
is_nonoverlapping
#[inline]
)
r? @ghost
@rustbot
modify labels: rollup
bors added S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
labels
📌 Commit cc73b71 has been approved by nnethercote
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
labels
bors mentioned this pull request
Finished benchmarking commit (bb8b11e): comparison URL.
Overall result: ❌✅ regressions and improvements - no action needed
@rustbot label: -perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | - | - | 0 |
Regressions ❌ (secondary) | 1.4% | [0.2%, 3.4%] | 3 |
Improvements ✅ (primary) | - | - | 0 |
Improvements ✅ (secondary) | -0.2% | [-0.2%, -0.2%] | 1 |
All ❌✅ (primary) | - | - | 0 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 2.8% | [2.8%, 2.8%] | 1 |
Regressions ❌ (secondary) | 3.4% | [1.1%, 6.5%] | 3 |
Improvements ✅ (primary) | -2.4% | [-3.8%, -1.0%] | 2 |
Improvements ✅ (secondary) | -2.7% | [-5.3%, -0.9%] | 9 |
All ❌✅ (primary) | -0.7% | [-3.8%, 2.8%] | 3 |
Cycles
This benchmark run did not return any relevant results for this metric.
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 651.363s -> 651.642s (0.04%)
Artifact size: 310.99 MiB -> 311.03 MiB (0.01%)
GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this pull request
…thercote
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags nnan
(no nans) and ninf
(no infs) can produce poison.
And this uses the algebraic float ops to fix rust-lang#120720
cc @orlp
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request
…thercote
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags nnan
(no nans) and ninf
(no infs) can produce poison.
And this uses the algebraic float ops to fix rust-lang#120720
cc @orlp
saethlin deleted the reasonable-fast-math branch
This was referenced
Feb 4, 2025
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
This PR was explicitly merged by bors.
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Relevant to the compiler team, which will review and decide on the PR/issue.
Relevant to the library team, which will review and decide on the PR/issue.