Distinguish between library and lang UB in assert_unsafe_precondition by saethlin · Pull Request #121662 · rust-lang/rust (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation76 Commits5 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

As described in #121583 (comment), assert_unsafe_precondition now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).

debug_assert_nounwind was originally added to avoid the "only at runtime" aspect of assert_unsafe_precondition. Since then the difference between the macros has gotten muddied. This totally revamps the situation.

Now all preconditions shall be checked with assert_unsafe_precondition. If you have a precondition that's only checkable at runtime, do a const_eval_select hack, as done in this PR.

r? RalfJung

rustbot added S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

T-compiler

Relevant to the compiler team, which will review and decide on the PR/issue.

T-libs

Relevant to the library team, which will review and decide on the PR/issue.

labels

Feb 27, 2024

This comment has been minimized.

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

This comment has been minimized.

bors added a commit to rust-lang-ci/rust that referenced this pull request

Feb 27, 2024

…=

Distinguish between library and lang UB in assert_unsafe_precondition

As described in rust-lang#121583 (comment), assert_unsafe_precondition now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).

Now all preconditions shall be checked with assert_unsafe_precondition. If you have a precondition that's only checkable at runtime, do a const_eval_select hack, as done in this PR.

r? RalfJung

☀️ Try build successful - checks-actions
Build commit: c492ab1 (c492ab1d2d6c77e2d02bf9d0a998a6aee430fd48)

This comment has been minimized.

Finished benchmarking commit (c492ab1): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	0.5%	[0.2%, 0.9%]	5
Regressions ❌ (secondary)	0.5%	[0.1%, 1.3%]	7
Improvements ✅ (primary)	-0.8%	[-2.0%, -0.2%]	9
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.3%	[-2.0%, 0.9%]	14

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	6.9%	[1.2%, 12.1%]	4
Regressions ❌ (secondary)	3.1%	[1.4%, 4.0%]	3
Improvements ✅ (primary)	-4.7%	[-11.6%, -0.5%]	8
Improvements ✅ (secondary)	-2.4%	[-2.4%, -2.4%]	1
All ❌✅ (primary)	-0.9%	[-11.6%, 12.1%]	12

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	1.0%	[0.9%, 1.2%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.2%	[-1.2%, -1.2%]	1
Improvements ✅ (secondary)	-1.0%	[-1.0%, -1.0%]	1
All ❌✅ (primary)	0.3%	[-1.2%, 1.2%]	3

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	0.2%	[0.1%, 0.5%]	16
Regressions ❌ (secondary)	0.9%	[0.0%, 1.5%]	12
Improvements ✅ (primary)	-0.7%	[-2.7%, -0.1%]	70
Improvements ✅ (secondary)	-0.7%	[-2.4%, -0.1%]	4
All ❌✅ (primary)	-0.5%	[-2.7%, 0.5%]	86

Bootstrap: 649.3s -> 650.779s (0.23%)
Artifact size: 311.08 MiB -> 311.03 MiB (-0.02%)

/// implement debug assertions that are included in the precompiled standard library, but can
/// be optimized out by builds that monomorphize the standard library code with debug
/// Whether we should check for library UB. This evaluate to the value of `cfg!(debug_assertions)`
/// in codegen, and `true` in const-eval/Miri.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe const-eval/Miri should be consistent with runtime here? Every single time we made Miri diverge from that people were caught off-guard by it...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does const-eval usually behave differently when you pass --release? Granted it's just UB checking... but still I expect const to always work the same regardless of the flags I pass. Even if that's wrong, it feel more like it's part of the type system than the runtime part of Rust.

I'll swap this to work the same because I think it's sufficiently less surprising for Miri and kind of a wash in const-eval.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does const-eval usually behave differently when you pass --release?

const-eval usually honors #[cfg(debug_assertions)]/cfg!(debug_assertions) the same way everything else does.

So, yes. It feels strange to do anything different here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open, I think library UB checks during const-eval and Miri should match runtime behavior.

bors added S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

labels

Mar 10, 2024

bors mentioned this pull request

Mar 10, 2024

saethlin deleted the precondition-unification branch

March 10, 2024 03:56

Finished benchmarking commit (768408a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	1.1%	[0.3%, 1.7%]	4
Regressions ❌ (secondary)	1.1%	[1.1%, 1.1%]	1
Improvements ✅ (primary)	-0.9%	[-1.7%, -0.4%]	4
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	0.1%	[-1.7%, 1.7%]	8

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	3.8%	[1.0%, 7.1%]	4
Regressions ❌ (secondary)	2.5%	[0.9%, 4.8%]	10
Improvements ✅ (primary)	-4.3%	[-6.8%, -0.3%]	8
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.6%	[-6.8%, 7.1%]	12

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	1.2%	[1.1%, 1.3%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.3%	[-1.5%, -1.1%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.0%	[-1.5%, 1.3%]	4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	0.3%	[0.1%, 0.5%]	18
Regressions ❌ (secondary)	0.9%	[0.0%, 1.5%]	12
Improvements ✅ (primary)	-0.7%	[-2.5%, -0.1%]	70
Improvements ✅ (secondary)	-0.6%	[-2.4%, -0.1%]	4
All ❌✅ (primary)	-0.5%	[-2.5%, 0.5%]	88

Bootstrap: 647.204s -> 645.087s (-0.33%)
Artifact size: 310.08 MiB -> 309.96 MiB (-0.04%)

celinval pushed a commit to model-checking/kani that referenced this pull request

Mar 12, 2024

Relevant upstream changes:

rust-lang/rust#120675: An intrinsic Symbol is now wrapped in a IntrinsicDef struct, so the relevant part of the code needed to be updated. rust-lang/rust#121464: The second argument of the create_wrapper_file function changed from a vector to a string. rust-lang/rust#121662: NullOp::DebugAssertions was renamed to NullOp::UbCheck and it now has data (currently unused by Kani) rust-lang/rust#121728: Introduces F16 and F128, so needed to add stubs for them rust-lang/rust#121969: parse_sess was renamed to psess, so updated the relevant code. rust-lang/rust#122059: The is_val_statically_known intrinsic is now used in some core::fmt code, so had to handle it in (codegen'ed to false). rust-lang/rust#122233: This added a new retag_box_to_raw intrinsic. This is an operation that is primarily relevant for stacked borrows. For Kani, we just return the pointer.

Resolves #3057

Visiting for weekly rustc-perf triage

primary regressions are to cranelift-codegen opt-full (1.74%), cargo opt-full (1.33%), clap opt-full (1.18%), and exa debug-incr-unchanged (0.28%).
the rust-timer run was "only" expected to regress 5 benchmarks, largely a different set, by a mean of 0.5%, not the 1.1% reported above. (I do not know what to take away from that observation, especially the fact that it a different set of benchmarks was impacted...)
not marking as triaged.

The changes above are chaotic, small, about-same-magnitude changes, and almost all benchmarks that were changed are opt builds. To me, that means that they are primarily if not entirely due to perturbing the MIR inliner's decisions.

Perhaps that's not the usual kind of perf report you're used to seeing in perf triage. I think most other kinds of changes produce perf reports where it is educational to look into each benchmark, because the fact that one crate was regressed or not means that there is a sort of code pattern that is now pessimized or handled better. I have certainly done such careful case-by-case analysis myself on many PRs and found it helpful. I firmly believe that such analysis is not useful in these perf reports that are primarily MIR inliner chaos (not noise!!). We could make rather subtle tweaks to the relevant standard library code, and completely change the set of benchmarks that get better or worse.

And in addition, for PRs like this where the perf reports are MIR inliner chaos, one should not expect the pre-merge perf report to look the same as the post-merge one. PRs that land between those try builds will provide the perturbations to cause a different chaotic outcome.

The exact same logic applies to #122282. It contains some regressions. That PR is good because it improves our ability to optimize MIR, and not because "improvements outweigh regressions". That suggests that the regressions in such PRs could be recouped by sufficient re-engineering of that change, but I believe they cannot because they reflect imprecision in the MIR inliner's cost model, as opposed to anything about the changes in the PR.

The same logic applies to any improvements that are reported in a PR that adds code to the standard library; such improvements reflect that the MIR inliner previously made wrong decisions and now happens (by sheer luck) to in a few cases make better ones.

If anyone wants to try to do case-by-case analysis of what changed in the generated code or analyze the MIR inliner's decisions they are welcome to attempt it. I will not be doing so. We already know what the next step is to improve the compile-time overhead of these checks, and that is #122282 as @dianqk indicated.

Binary size results looking good though, which is an expected result of this change

Binary size results looking good though, which is an expected result of this change

No. I think you're making exactly the mistake I tried to warn about above:

The same logic applies to any improvements that are reported in a PR that adds code to the standard library; such improvements reflect that the MIR inliner previously made wrong decisions and now happens (by sheer luck) to in a few cases make better ones.

In all cases the relevant code is behind an if false to an opt build, so LLVM should be just as capable of optimizing these checks before as it is now. I strongly suspect that what we've done is just nudge the MIR inliner to inline less, and in terms of binary size, less inlining tends to be a win.

I do not understand exactly why the MIR inliner tends to do inlinings that LLVM doesn't, but I've done a fair bit of analysis in that area before and it does. For some reason, a lot of drop-related symbols appear or disappear from binaries when inlining is tweaked.

Thanks for the explanations @saethlin , I really appreciate it, and I totally buy the explanation you have provided here.

These are really good thoughts. I am wondering if there's any place less transient we could put them, e.g. in some sort of guide/document that would help people doing the rustc-perf triage with the kinds of things they should be considering.

@rustbot label: +perf-regression-triaged

Not just perf triage, we should have a manual for doing perf report analysis in general.

bjorn3 pushed a commit to bjorn3/rust that referenced this pull request

Mar 16, 2024

…=RalfJung

Distinguish between library and lang UB in assert_unsafe_precondition

Now all preconditions shall be checked with assert_unsafe_precondition. If you have a precondition that's only checkable at runtime, do a const_eval_select hack, as done in this PR.

r? RalfJung

matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this pull request

Mar 21, 2024

…=RalfJung

Distinguish between library and lang UB in assert_unsafe_precondition

Now all preconditions shall be checked with assert_unsafe_precondition. If you have a precondition that's only checkable at runtime, do a const_eval_select hack, as done in this PR.

r? RalfJung

CAD97 mentioned this pull request

Mar 28, 2024

I think I stumbled onto why this is causing "chaotic, small, about-same-magnitude changes" by "perturbing the MIR inliner's decisions" in #123174: the newly introduced branch is blocking MIR inlining of #[inline] functions like Vec::deref, which without the branch would otherwise be a single straightline basic block and obvious candidates for inlining.

Labels

merged-by-bors

This PR was explicitly merged by bors.

perf-regression

Performance regression.

perf-regression-triaged

The performance regression has been triaged.

S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

T-compiler

Relevant to the compiler team, which will review and decide on the PR/issue.

T-libs

Relevant to the library team, which will review and decide on the PR/issue.