directly expose copy and copy_nonoverlapping intrinsics by RalfJung · Pull Request #81238 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation27 Commits2 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This effectively un-does #57997. That should help with ptr::read
codegen in debug builds (and any other of these low-level functions that bottoms out at copy
/copy_nonoverlapping
), where the wrapper function will not get inlined. See the discussion in #80290 and #81163.
r? @cramertj
(rust-highfive has picked a reviewer for you, use r? to override)
This comment has been minimized.
This loses the (currently disabled until they can be constified) debug assertions. What're the upsides to this approach over marking the wrapper functions #[inline(always)]
, aside from the cg_clif issue mentioned here #81163 (comment)? Is there a known non-negligible build time downside to #[inline(always)]
wrapper functions?
This is not the first time those debug assertions came under criticism; e.g., that's why they were turned into an abort
previously. They might even have contributed significantly to the slowdown that prompted debug-assertions-std
to be split from debug-assertions
(in config.toml
), though I do not think this was ever benchmarked. At the same time, these debug assertions are not actually helping users since their libstd is a release build (even for cargo build --debug
).
So together with that const
problem and the discussion about function call overhead, I figured maybe those debug assertions are simply not worth the trouble. I don't know what one would have to measure to evaluate if inline(always)
is a viable alternative; if someone wants to take over and do those experiments, I'd be happy to go with that approach instead.
@alecmocatta if you are comparing the removal of the wrappers with an addition of #[inline(always)]
to them, there are a few extra costs associated with the latter. First, the wrappers have to be monomorphized and generated in each code generation unit. Then LLVM will have to inline calls to them, and at debuginfo > 0 produce additional debug information describing the inlining that took place. On the other hand, simply removing the wrappers has none of those costs.
If MIR inlining were to be enabled by default, the inlining could take place at MIR level when code is still generic, amortizing some of those costs, but we aren't at this point yet.
@tmiasko Thanks, that's helpful. My thinking is that #[inline(always)]
on the wrapper functions might in practise have a statistically insignificant effect on build times vs this PR, ~100% of its codegen win, while also preserving the possibility of the debug assertions being re-enabled when they can be constified.
To be honest I thought MIR inlining was happening for Edit: Ahah! There's a special LLVM pass that's enabled even in debug builds specifically to handle #[inline(always)]
, but I think there's something I'm missing. What is it that's inlining foo
here if you check the LLVM IR?#[inline(always)]
functions.
Another advantage of this PR is that @tmiasko @wesleywiser's copy_nonoverlapping(_, _, 1)
-lowering-pass will kick in -- this runs in rustc, pre-inlining.
Awaiting bors try build completion.
⌛ Trying commit 18d12ad with merge 375082d2e33538dc4a882ee01db94cd61563372c...
☀️ Try build successful - checks-actions
Build commit: 375082d2e33538dc4a882ee01db94cd61563372c (375082d2e33538dc4a882ee01db94cd61563372c
)
Finished benchmarking try commit (375082d2e33538dc4a882ee01db94cd61563372c): comparison url.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup-
to bors.
Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.
@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf
Huge improvements of up to 2.3% with a few regressions of up to 0.8%.
Relevant to the library team, which will review and decide on the PR/issue.
label
Looks good to me.
The only downside is that
// FIXME: Perform these checks only at run time
now disappears without a place to fix it. But since debug assertions in std
are pretty much useless, I don't think that matters much. I don't think this FIXME would've been resolved anyway, other than by removing it, which is what this PR does.
@bors r+
📌 Commit 1a80635 has been approved by m-ou-se
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
bors mentioned this pull request
flip1995 pushed a commit to flip1995/rust that referenced this pull request
directly expose copy and copy_nonoverlapping intrinsics
This effectively un-does rust-lang#57997. That should help with ptr::read
codegen in debug builds (and any other of these low-level functions that bottoms out at copy
/copy_nonoverlapping
), where the wrapper function will not get inlined. See the discussion in rust-lang#80290 and rust-lang#81163.
Cc @bjorn3
@therealprof
This was referenced
Apr 30, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request
bors added a commit to rust-lang-ci/rust that referenced this pull request
flip1995 pushed a commit to flip1995/rust that referenced this pull request
eddyb mentioned this pull request
eddyb mentioned this pull request
bors added a commit to rust-lang-ci/rust that referenced this pull request
Avoid using the copy_nonoverlapping
wrapper through mem::replace
.
This is a much simpler way to achieve the pre-rust-lang#86003 behavior of mem::replace
not needing dynamically-sized memcpy
s (at least before inlining), than re-doing rust-lang#81238 (which needs rust-lang#86699 or something similar).
I didn't notice it until recently, but ptr::write
already explicitly avoided using the wrapper, while ptr::read
just called the wrapper (and was the reason for us observing any behavior change from rust-lang#86003 in Rust-GPU).
The codegen test I've added fails without the change to core::ptr::read
like this (ignore the v0
mangling, I was using a worktree with it turned on by default, for this):
13: ; core::intrinsics::copy_nonoverlapping::<u8>
14: ; Function Attrs: inlinehint nonlazybind uwtable
15: define internal void `@_RINvNtCscK5tvALCJol_4core10intrinsics19copy_nonoverlappinghECsaS4X3EinRE8_25mem_replace_direct_memcpy(i8*` %src, i8* %dst, i64 %count) unnamed_addr #0 {
16: start:
17: %0 = mul i64 %count, 1
18: call void `@llvm.memcpy.p0i8.p0i8.i64(i8*` align 1 %dst, i8* align 1 %src, i64 %0, i1 false)
not:17 !~~~~~~~~~~~~~~~~~~~~~ error: no match expected
19: ret void
20: }
With the core::ptr::read
change, core::intrinsics::copy_nonoverlapping
doesn't get instantiated and the test passes.
r? @m-ou-se
cc @nagisa
(codegen test) @oli-obk
/ @RalfJung
(miri diagnostic changes)
FYI, this got reverted shortly after landing due to being a breaking change, and the PR that would have made this possible got closed due to inactivity (#86699). However, #87827 should still help with code quality for read
and functions using that (such as replace
). I don't have plans myself to pursue this further. If someone thinks something should still be done here, please open an issue.
Labels
Area: Intrinsics
This PR was explicitly merged by bors.
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Relevant to the library team, which will review and decide on the PR/issue.