directly expose copy and copy_nonoverlapping intrinsics by RalfJung · Pull Request #81238 · rust-lang/rust (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation27 Commits2 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

RalfJung

This effectively un-does #57997. That should help with ptr::read codegen in debug builds (and any other of these low-level functions that bottoms out at copy/copy_nonoverlapping), where the wrapper function will not get inlined. See the discussion in #80290 and #81163.

Cc @bjorn3 @therealprof

@rust-highfive

r? @cramertj

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-log-analyzer

This comment has been minimized.

@alecmocatta

This loses the (currently disabled until they can be constified) debug assertions. What're the upsides to this approach over marking the wrapper functions #[inline(always)], aside from the cg_clif issue mentioned here #81163 (comment)? Is there a known non-negligible build time downside to #[inline(always)] wrapper functions?

@RalfJung

@RalfJung

This is not the first time those debug assertions came under criticism; e.g., that's why they were turned into an abort previously. They might even have contributed significantly to the slowdown that prompted debug-assertions-std to be split from debug-assertions (in config.toml), though I do not think this was ever benchmarked. At the same time, these debug assertions are not actually helping users since their libstd is a release build (even for cargo build --debug).

So together with that const problem and the discussion about function call overhead, I figured maybe those debug assertions are simply not worth the trouble. I don't know what one would have to measure to evaluate if inline(always) is a viable alternative; if someone wants to take over and do those experiments, I'd be happy to go with that approach instead.

@tmiasko

@alecmocatta if you are comparing the removal of the wrappers with an addition of #[inline(always)] to them, there are a few extra costs associated with the latter. First, the wrappers have to be monomorphized and generated in each code generation unit. Then LLVM will have to inline calls to them, and at debuginfo > 0 produce additional debug information describing the inlining that took place. On the other hand, simply removing the wrappers has none of those costs.

If MIR inlining were to be enabled by default, the inlining could take place at MIR level when code is still generic, amortizing some of those costs, but we aren't at this point yet.

@alecmocatta

@tmiasko Thanks, that's helpful. My thinking is that #[inline(always)] on the wrapper functions might in practise have a statistically insignificant effect on build times vs this PR, ~100% of its codegen win, while also preserving the possibility of the debug assertions being re-enabled when they can be constified.

To be honest I thought MIR inlining was happening for #[inline(always)], but I think there's something I'm missing. What is it that's inlining foo here if you check the LLVM IR? Edit: Ahah! There's a special LLVM pass that's enabled even in debug builds specifically to handle #[inline(always)] functions.

@RalfJung

Another advantage of this PR is that @tmiasko @wesleywiser's copy_nonoverlapping(_, _, 1)-lowering-pass will kick in -- this runs in rustc, pre-inlining.

@tmiasko

@rust-timer

Awaiting bors try build completion.

@bors

⌛ Trying commit 18d12ad with merge 375082d2e33538dc4a882ee01db94cd61563372c...

@bors

☀️ Try build successful - checks-actions
Build commit: 375082d2e33538dc4a882ee01db94cd61563372c (375082d2e33538dc4a882ee01db94cd61563372c)

@rust-timer

@rust-timer

Finished benchmarking try commit (375082d2e33538dc4a882ee01db94cd61563372c): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@bjorn3

Huge improvements of up to 2.3% with a few regressions of up to 0.8%.

@RalfJung

@m-ou-se m-ou-se added the T-libs

Relevant to the library team, which will review and decide on the PR/issue.

label

Feb 13, 2021

@m-ou-se

Looks good to me.

The only downside is that

// FIXME: Perform these checks only at run time

now disappears without a place to fix it. But since debug assertions in std are pretty much useless, I don't think that matters much. I don't think this FIXME would've been resolved anyway, other than by removing it, which is what this PR does.

@bors r+

@bors

📌 Commit 1a80635 has been approved by m-ou-se

@bors bors added S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

labels

Feb 13, 2021

@bors

@bors

@bors bors mentioned this pull request

Feb 13, 2021

flip1995 pushed a commit to flip1995/rust that referenced this pull request

Feb 25, 2021

@bors

directly expose copy and copy_nonoverlapping intrinsics

This effectively un-does rust-lang#57997. That should help with ptr::read codegen in debug builds (and any other of these low-level functions that bottoms out at copy/copy_nonoverlapping), where the wrapper function will not get inlined. See the discussion in rust-lang#80290 and rust-lang#81163.

Cc @bjorn3 @therealprof

This was referenced

Apr 30, 2021

bors added a commit to rust-lang-ci/rust that referenced this pull request

May 3, 2021

@bors

bors added a commit to rust-lang-ci/rust that referenced this pull request

Jun 9, 2021

@bors

flip1995 pushed a commit to flip1995/rust that referenced this pull request

Jun 17, 2021

@bors

@eddyb eddyb mentioned this pull request

Jun 28, 2021

@eddyb eddyb mentioned this pull request

Aug 6, 2021

bors added a commit to rust-lang-ci/rust that referenced this pull request

Aug 8, 2021

@bors

Avoid using the copy_nonoverlapping wrapper through mem::replace.

This is a much simpler way to achieve the pre-rust-lang#86003 behavior of mem::replace not needing dynamically-sized memcpys (at least before inlining), than re-doing rust-lang#81238 (which needs rust-lang#86699 or something similar).

I didn't notice it until recently, but ptr::write already explicitly avoided using the wrapper, while ptr::read just called the wrapper (and was the reason for us observing any behavior change from rust-lang#86003 in Rust-GPU).


The codegen test I've added fails without the change to core::ptr::read like this (ignore the v0 mangling, I was using a worktree with it turned on by default, for this):

       13: ; core::intrinsics::copy_nonoverlapping::<u8>
       14: ; Function Attrs: inlinehint nonlazybind uwtable
       15: define internal void `@_RINvNtCscK5tvALCJol_4core10intrinsics19copy_nonoverlappinghECsaS4X3EinRE8_25mem_replace_direct_memcpy(i8*` %src, i8* %dst, i64 %count) unnamed_addr #0 {
       16: start:
       17:  %0 = mul i64 %count, 1
       18:  call void `@llvm.memcpy.p0i8.p0i8.i64(i8*` align 1 %dst, i8* align 1 %src, i64 %0, i1 false)
not:17      !~~~~~~~~~~~~~~~~~~~~~                                                                     error: no match expected
       19:  ret void
       20: }

With the core::ptr::read change, core::intrinsics::copy_nonoverlapping doesn't get instantiated and the test passes.


r? @m-ou-se cc @nagisa (codegen test) @oli-obk / @RalfJung (miri diagnostic changes)

@RalfJung

FYI, this got reverted shortly after landing due to being a breaking change, and the PR that would have made this possible got closed due to inactivity (#86699). However, #87827 should still help with code quality for read and functions using that (such as replace). I don't have plans myself to pursue this further. If someone thinks something should still be done here, please open an issue.

Labels

A-intrinsics

Area: Intrinsics

merged-by-bors

This PR was explicitly merged by bors.

S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

T-libs

Relevant to the library team, which will review and decide on the PR/issue.