Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison by saethlin · Pull Request #120718 · rust-lang/rust (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation38 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the LangRef, only the flags nnan (no nans) and ninf (no infs) can produce poison.

And this uses the algebraic float ops to fix #120720

cc @orlp

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

rustbot added S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

T-compiler

Relevant to the compiler team, which will review and decide on the PR/issue.

labels

Feb 6, 2024

saethlin added the A-LLVM

Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.

label

Feb 6, 2024

saethlin changed the title~~Set only fast-math flags which don't produce poison~~ Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison

Feb 6, 2024

I'd like to support this (I mean, I suggested it to @saethlin), it seems that with just the current safe LLVM flags available a lot of optimizations are already possible (and more algebraically justified optimizations could be added in the future). Most notably, autovectorization and FMA conversion both work out of the box. From a quick test of this PR,

fn sum(arr: &[f32]) -> f32 { arr.iter().fold(0.0, |a, b| fadd_algebraic(a, *b)) }

generated excellent autovectorized code, completely safely.

The current f*_fast intrinsics are essentially unusable in generic code because they are instant UB on infinities/NaNs. It is for any non-trivial algorithm completely infeasible to verify that no infinities/NaNs exist in the input, let alone are produced as (intermediate) results. In a quick search through Github I've found every place that used f*_fast to be unsound. The safe algebraically optimizeable float operations are desperately needed.

saethlin marked this pull request as ready for review

February 7, 2024 02:09

saethlin added the T-libs

Relevant to the library team, which will review and decide on the PR/issue.

label

Feb 7, 2024

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

I think this needs more compiler review than libs...

r? compiler

I'm on vacation.
r? compiler

I->setHasAllowReassoc(true);
I->setHasAllowContract(true);
I->setHasAllowReciprocal(true);
I->setHasNoSignedZeros(true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Does the word "algebraic" have a specific meaning here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I see from other places that it does.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Sure, why not. I'll add it.

Does the word "algebraic" have a specific meaning here?

I didn't want to just defang the existing intrinsics because there are some optimizations which rely on assuming that NaN/Inf do not occur. So I needed to come up with a new name, and the way I think about these intrinsics is that unlike IEEE float operations, they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c and a / b = a * (1 / b).

I don't think that the name is perfect, and I'd be happy to see someone suggest a better name, but @orlp seems perfectly happy calling them "algebraic".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they permit the usual algebraic transformations. Things like a + (b + b) = (a + b) + c and a / b = a * (1 / b)

That would be a great explanation to put in a comment somewhere :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about afn (Approximate functions)? It doesn't poison but isn't mentioned here.

Sure, why not. I'll add it.

Please don't. Replacing functions by their approximations isn't an algebraically justified optimization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 And in any case, I don't think it would matter for these intrinsics.

@@ -1882,6 +1882,46 @@ extern "rust-intrinsic" {
#[rustc_nounwind]
pub fn frem_fast<T: Copy>(a: T, b: T) -> T;

/// Float addition that allows optimizations based on algebraic rules.
///
/// This intrinsic does not have a stable counterpart.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which meaning of "stable" does this use?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to call this intrinsic is to use the core_intrinsics feature. We do not have a wrapper for these like the atomic intrinsics.

@@ -417,8 +417,7 @@ extern "C" LLVMAttributeRef LLVMRustCreateMemoryEffectsAttr(LLVMContextRef C,
report_fatal_error("bad MemoryEffects.");
}
}

// Enable a fast-math flag
// Enable all fast-math flags
//
// https://llvm.org/docs/LangRef.html#fast-math-flags
extern "C" void LLVMRustSetFastMath(LLVMValueRef V) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the fast intrinsics?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep them for now and let people play with both variants. I hope that based on experience we can make an argument that the unsafety of the unsafe ones is not worth the optimizations that they unlock, but I have no data to back that up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I am very much the opposite of a floating point expert, but you've been bounced around various reviewers so I'll do my best to review this and allow progress.

In general, adding safer variants of FP intrinsics seems fine, as does converting some existing intrinsics to use them when there are known bugs (#120720) with the less safe variants. I've asked some questions above just to give myself a bit more certainty about this change.

My final questions here are about the exact meaning of "intrinsics". I think these new algebraic intrinsics are internal only? And they are used to implement simd_reduce_{add,mul}_unordered, which are also internal-only intrinsics? And they are used to implement the externally visible _mm512_reduce_add_pd intrinsic mentioned in #120720? Though I see no mention of _mm512_reduce_add_pd anywhere within the rust-lang/rust repository, so I'm confused by that.

📌 Commit 41fddb5 has been approved by nnethercote

It is now in the queue for this repository.

bors added S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

labels

Feb 19, 2024

saethlin added a commit to saethlin/rust that referenced this pull request

Feb 20, 2024

…nethercote

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison

And this uses the algebraic float ops to fix rust-lang#120720

cc @orlp

bors added a commit to rust-lang-ci/rust that referenced this pull request

Feb 20, 2024

Rollup of 8 pull requests

Successful merges:

rust-lang#120718 (Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison)
rust-lang#121195 (unstable-book: Separate testing and production sanitizers)
rust-lang#121205 (Merge CompilerError::CompilationFailed and CompilerError::ICE.)
rust-lang#121233 (Move the extra directives for Mode::CoverageRun into iter_header)
rust-lang#121256 (Allow AST and HIR visitors to return ControlFlow)
rust-lang#121307 (Drive-by DUMMY_SP -> Span and fmt changes)
rust-lang#121310 (Remove an old hack for rustdoc)
rust-lang#121311 (Make is_nonoverlapping #[inline])

r? @ghost @rustbot modify labels: rollup

bors added S-waiting-on-author

Status: This is awaiting some action (such as code changes or more information) from the author.

and removed S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

labels

Feb 20, 2024

📌 Commit cc73b71 has been approved by nnethercote

It is now in the queue for this repository.

bors added S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-author

Status: This is awaiting some action (such as code changes or more information) from the author.

labels

Feb 20, 2024

bors mentioned this pull request

Feb 21, 2024

Finished benchmarking commit (bb8b11e): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.4%	[0.2%, 3.4%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	2.8%	[2.8%, 2.8%]	1
Regressions ❌ (secondary)	3.4%	[1.1%, 6.5%]	3
Improvements ✅ (primary)	-2.4%	[-3.8%, -1.0%]	2
Improvements ✅ (secondary)	-2.7%	[-5.3%, -0.9%]	9
All ❌✅ (primary)	-0.7%	[-3.8%, 2.8%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 651.363s -> 651.642s (0.04%)
Artifact size: 310.99 MiB -> 311.03 MiB (0.01%)

GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this pull request

Mar 5, 2024

…thercote

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison

And this uses the algebraic float ops to fix rust-lang#120720

cc @orlp

bjorn3 pushed a commit to bjorn3/rust that referenced this pull request

Mar 8, 2024

…thercote

Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison

And this uses the algebraic float ops to fix rust-lang#120720

cc @orlp

saethlin deleted the reasonable-fast-math branch

May 3, 2024 22:33

This was referenced

Feb 4, 2025

Labels

A-LLVM

Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.

merged-by-bors

This PR was explicitly merged by bors.

S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

T-compiler

Relevant to the compiler team, which will review and decide on the PR/issue.

T-libs

Relevant to the library team, which will review and decide on the PR/issue.