Optimize integer pow
by removing the exit branch by mzabaluev · Pull Request #122884 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation39 Commits4 Checks6 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
The branch at the end of the pow
implementations is redundant with multiplication code already present in the loop. By rotating the exit check, this branch can be largely removed, improving code size and reducing instruction cache misses.
Testing on my machine (x86_64
, 11th Gen Intel Core i5-1135G7 @ 2.40GHz), the num::int_pow
benchmarks improve by some 40% for the unchecked operations and show some slight improvement for the checked operations as well.
r? @Amanieu
rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r?
to explicitly pick a reviewer
rustbot added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
Relevant to the library team, which will review and decide on the PR/issue.
labels
📌 Commit 76d2530 has been approved by Amanieu
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
bors added a commit to rust-lang-ci/rust that referenced this pull request
…Amanieu
Optimize integer pow
by removing the exit branch
The branch at the end of the pow
implementations is redundant with multiplication code already present in the loop. By rotating the exit check, this branch can be largely removed, improving code size and reducing instruction cache misses.
Testing on my machine (x86_64
, 11th Gen Intel Core i5-1135G7 @ 2.40GHz), the num::int_pow
benchmarks improve by some 40% for the unchecked operations and show some slight improvement for the checked operations as well.
This comment has been minimized.
bors added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
labels
The job
x86_64-gnu-llvm-18
failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)failures: ---- [codegen] tests/codegen/issues/issue-34947-pow-i32.rs stdout ---- error: verification with 'FileCheck' failed status: exit status: 1 command: "/usr/lib/llvm-18/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-34947-pow-i32/issue-34947-pow-i32.ll" "/checkout/tests/codegen/issues/issue-34947-pow-i32.rs" "--check-prefix=CHECK" "--check-prefix" "NONMSVC" "--allow-unused-prefixes" "--dump-input-context" "100" --- stderr ------------------------------- --- stderr ------------------------------- /checkout/tests/codegen/issues/issue-34947-pow-i32.rs:9:17: error: CHECK-NEXT: is not on the line after the previous match // CHECK-NEXT: mul
I'm not familiar with this check, so I don't understand what's failing here and what should the fix be.
It seems that your PR has introduced a regression: LLVM is no longer able to optimize pow(5)
down to just 3 multiply instructions.
It seems that your PR has introduced a regression: LLVM is no longer able to optimize
pow(5)
down to just 3 multiply instructions.
Does this mean the modified code performs worse in this specific case?
Yes, it will perform much worse in that specific case since LLVM is unable to optimize the loop away. See https://godbolt.org/z/nMY79Gn8r for the current code that is being generated. It might be possible to re-arrange the code so that you still get the performance benefit of this PR while still letting LLVM optimize the loop, but I'm not sure.
@bors r-
(bors sync fixup)
bors added S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
The lack of optimization in case of a small const argument value is unfortunate.
I briefly tried to salvage it by giving the optimizer an easier time without re-introducing redundancy in the dynamic case, but didn't come up with any good ideas.
Maybe an unrolled fast path for argument values in the 0..=6 range? This would feel like an exercise in tricking the optimizer and placating the benchmarks.
The branch at the end of the pow
implementations is redundant
with multiplication code already present in the loop. By rotating
the exit check, this branch can be largely removed, improving code size
and instruction cache coherence.
If I understand correctly you don't know how to fix the regression in a satisfactory manner, and you're not going to make the argument that the regression is tolerable?
If I'm right you should probably close this. You can always reopen if you get some new inspiration or can find guidance.
If might be worth trying something with is_val_statically_known to have 2 different paths depending on whether the input argument is a constant.
The newly optimized loop has introduced a regression in the case when pow is called with a small constant exponent. LLVM is no longer able to unroll the loop and the generated code is larger and slower than what's expected in tests.
Match and handle small exponent values separately by branching out to an explicit multiplication sequence for that exponent. Powers larger than 6 need more than three multiplications, so these cases are less likely to benefit from this optimization, also such constant exponents are less likely to be used in practice. For uses with a non-constant exponent, this might also provide a performance benefit if the exponent is small and does not vary between successive calls, so the same match arm tends to be taken as a predicted branch.
If might be worth trying something with is_val_statically_known to have 2 different paths depending on whether the input argument is a constant.
I will combine this with my suggestion for the statically known case, thanks for the tip!
@oskgo it looks like we've found a way to resolve the regression, don't close this yet.
I get this error when trying to use is_val_statically_known
inside pow
methods:
error: `is_val_statically_known` is not yet stable as a const fn
It is what it says on the tin: pow
is annotated as const-stable, so it cannot call the const-unstable is_val_statically_known
.
Your playground examples don't (can't) use stability attributes.
Right, in that case maybe it's best to go back to the version with the unroll loop.
Oh, I get it: rustc_allow_const_fn_unstable
is an item attribute that is enabled by the feature.
In the dynamic exponent case, it's preferred to not increase code size, so use solely the loop-based implementation there. This shows about 4% penalty in the variable exponent benchmarks on x86_64.
pinging @rust-lang/wg-const-eval due to new usage of rustc_allow_const_fn_unstable
. It should be fine since this PR is purely an optimization and can always be reverted.
is_val_statically_known
is a very harmless intrinsic from a const-eval perspective, so seems fine for me.
oskgo added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
labels
// This gives the optimizer a way to efficiently inline call sites |
---|
// for the most common use cases with constant exponents. |
// Currently, LLVM is unable to unroll the loop below. |
match exp { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this special casing could we instead just have the original loop (which LLVM knows how to unroll) for the is_val_statically_known
case and your new loop for the non-constant case?
And do the same for all the other pow
functions.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated pow
and wrapped_pow
as suggested.
I'm not sure the extra complication is justified for the checked operations, but I guess the optimizer will have better opportunities with the original loop there as well. I will try to make a macro so that uniform code is used everywhere without repetition.
Amanieu added S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
This comment has been minimized.
Give LLVM the for original, optimizable loop in pow and wrapped_pow functions in the case when the exponent is statically known.
Dylan-DPC added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
labels
📌 Commit ac88b33 has been approved by Amanieu
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
bors added a commit to rust-lang-ci/rust that referenced this pull request
…iaskrgr
Rollup of 7 pull requests
Successful merges:
- rust-lang#122884 (Optimize integer
pow
by removing the exit branch) - rust-lang#127857 (Allow to customize
// TODO:
comment for deprecated safe autofix) - rust-lang#129034 (Add
#[must_use]
attribute toCoroutine
trait) - rust-lang#129049 (compiletest: Don't panic on unknown JSON-like output lines)
- rust-lang#129050 (Emit a warning instead of an error if
--generate-link-to-definition
is used with other output formats than HTML) - rust-lang#129056 (Fix one usage of target triple in bootstrap)
- rust-lang#129058 (Add mw back to review rotation)
r? @ghost
@rustbot
modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request
…iaskrgr
Rollup of 7 pull requests
Successful merges:
- rust-lang#122884 (Optimize integer
pow
by removing the exit branch) - rust-lang#127857 (Allow to customize
// TODO:
comment for deprecated safe autofix) - rust-lang#129034 (Add
#[must_use]
attribute toCoroutine
trait) - rust-lang#129049 (compiletest: Don't panic on unknown JSON-like output lines)
- rust-lang#129050 (Emit a warning instead of an error if
--generate-link-to-definition
is used with other output formats than HTML) - rust-lang#129056 (Fix one usage of target triple in bootstrap)
- rust-lang#129058 (Add mw back to review rotation)
r? @ghost
@rustbot
modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request
…iaskrgr
Rollup of 7 pull requests
Successful merges:
- rust-lang#122884 (Optimize integer
pow
by removing the exit branch) - rust-lang#127857 (Allow to customize
// TODO:
comment for deprecated safe autofix) - rust-lang#129034 (Add
#[must_use]
attribute toCoroutine
trait) - rust-lang#129049 (compiletest: Don't panic on unknown JSON-like output lines)
- rust-lang#129050 (Emit a warning instead of an error if
--generate-link-to-definition
is used with other output formats than HTML) - rust-lang#129056 (Fix one usage of target triple in bootstrap)
- rust-lang#129058 (Add mw back to review rotation)
r? @ghost
@rustbot
modify labels: rollup
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request
Rollup merge of rust-lang#122884 - mzabaluev:pow-remove-exit-branch, r=Amanieu
Optimize integer pow
by removing the exit branch
The branch at the end of the pow
implementations is redundant with multiplication code already present in the loop. By rotating the exit check, this branch can be largely removed, improving code size and reducing instruction cache misses.
Testing on my machine (x86_64
, 11th Gen Intel Core i5-1135G7 @ 2.40GHz), the num::int_pow
benchmarks improve by some 40% for the unchecked operations and show some slight improvement for the checked operations as well.
Labels
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Relevant to the library team, which will review and decide on the PR/issue.