Distinguish between library and lang UB in assert_unsafe_precondition by saethlin · Pull Request #121662 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation76 Commits5 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
As described in #121583 (comment), assert_unsafe_precondition
now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).
debug_assert_nounwind
was originally added to avoid the "only at runtime" aspect of assert_unsafe_precondition
. Since then the difference between the macros has gotten muddied. This totally revamps the situation.
Now all preconditions shall be checked with assert_unsafe_precondition
. If you have a precondition that's only checkable at runtime, do a const_eval_select
hack, as done in this PR.
r? RalfJung
rustbot added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
Relevant to the compiler team, which will review and decide on the PR/issue.
Relevant to the library team, which will review and decide on the PR/issue.
labels
This comment has been minimized.
This comment has been minimized.
Some changes occurred in src/tools/clippy
cc @rust-lang/clippy
This comment has been minimized.
bors added a commit to rust-lang-ci/rust that referenced this pull request
…=
Distinguish between library and lang UB in assert_unsafe_precondition
As described in rust-lang#121583 (comment), assert_unsafe_precondition
now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).
debug_assert_nounwind
was originally added to avoid the "only at runtime" aspect of assert_unsafe_precondition
. Since then the difference between the macros has gotten muddied. This totally revamps the situation.
Now all preconditions shall be checked with assert_unsafe_precondition
. If you have a precondition that's only checkable at runtime, do a const_eval_select
hack, as done in this PR.
r? RalfJung
☀️ Try build successful - checks-actions
Build commit: c492ab1 (c492ab1d2d6c77e2d02bf9d0a998a6aee430fd48
)
This comment has been minimized.
Finished benchmarking commit (c492ab1): comparison URL.
Overall result: ❌✅ regressions and improvements - ACTION NEEDED
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.
@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 0.5% | [0.2%, 0.9%] | 5 |
Regressions ❌ (secondary) | 0.5% | [0.1%, 1.3%] | 7 |
Improvements ✅ (primary) | -0.8% | [-2.0%, -0.2%] | 9 |
Improvements ✅ (secondary) | - | - | 0 |
All ❌✅ (primary) | -0.3% | [-2.0%, 0.9%] | 14 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 6.9% | [1.2%, 12.1%] | 4 |
Regressions ❌ (secondary) | 3.1% | [1.4%, 4.0%] | 3 |
Improvements ✅ (primary) | -4.7% | [-11.6%, -0.5%] | 8 |
Improvements ✅ (secondary) | -2.4% | [-2.4%, -2.4%] | 1 |
All ❌✅ (primary) | -0.9% | [-11.6%, 12.1%] | 12 |
Cycles
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 1.0% | [0.9%, 1.2%] | 2 |
Regressions ❌ (secondary) | - | - | 0 |
Improvements ✅ (primary) | -1.2% | [-1.2%, -1.2%] | 1 |
Improvements ✅ (secondary) | -1.0% | [-1.0%, -1.0%] | 1 |
All ❌✅ (primary) | 0.3% | [-1.2%, 1.2%] | 3 |
Binary size
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 0.2% | [0.1%, 0.5%] | 16 |
Regressions ❌ (secondary) | 0.9% | [0.0%, 1.5%] | 12 |
Improvements ✅ (primary) | -0.7% | [-2.7%, -0.1%] | 70 |
Improvements ✅ (secondary) | -0.7% | [-2.4%, -0.1%] | 4 |
All ❌✅ (primary) | -0.5% | [-2.7%, 0.5%] | 86 |
Bootstrap: 649.3s -> 650.779s (0.23%)
Artifact size: 311.08 MiB -> 311.03 MiB (-0.02%)
/// implement debug assertions that are included in the precompiled standard library, but can |
---|
/// be optimized out by builds that monomorphize the standard library code with debug |
/// Whether we should check for library UB. This evaluate to the value of `cfg!(debug_assertions)` |
/// in codegen, and `true` in const-eval/Miri. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe const-eval/Miri should be consistent with runtime here? Every single time we made Miri diverge from that people were caught off-guard by it...
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does const-eval usually behave differently when you pass --release
? Granted it's just UB checking... but still I expect const
to always work the same regardless of the flags I pass. Even if that's wrong, it feel more like it's part of the type system than the runtime part of Rust.
I'll swap this to work the same because I think it's sufficiently less surprising for Miri and kind of a wash in const-eval.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does const-eval usually behave differently when you pass --release?
const-eval usually honors #[cfg(debug_assertions)]
/cfg!(debug_assertions)
the same way everything else does.
So, yes. It feels strange to do anything different here.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still open, I think library UB checks during const-eval and Miri should match runtime behavior.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
bors mentioned this pull request
saethlin deleted the precondition-unification branch
Finished benchmarking commit (768408a): comparison URL.
Overall result: ❌✅ regressions and improvements - ACTION NEEDED
Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged
label to this PR.
@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 1.1% | [0.3%, 1.7%] | 4 |
Regressions ❌ (secondary) | 1.1% | [1.1%, 1.1%] | 1 |
Improvements ✅ (primary) | -0.9% | [-1.7%, -0.4%] | 4 |
Improvements ✅ (secondary) | -0.2% | [-0.2%, -0.2%] | 1 |
All ❌✅ (primary) | 0.1% | [-1.7%, 1.7%] | 8 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 3.8% | [1.0%, 7.1%] | 4 |
Regressions ❌ (secondary) | 2.5% | [0.9%, 4.8%] | 10 |
Improvements ✅ (primary) | -4.3% | [-6.8%, -0.3%] | 8 |
Improvements ✅ (secondary) | - | - | 0 |
All ❌✅ (primary) | -1.6% | [-6.8%, 7.1%] | 12 |
Cycles
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 1.2% | [1.1%, 1.3%] | 2 |
Regressions ❌ (secondary) | - | - | 0 |
Improvements ✅ (primary) | -1.3% | [-1.5%, -1.1%] | 2 |
Improvements ✅ (secondary) | - | - | 0 |
All ❌✅ (primary) | -0.0% | [-1.5%, 1.3%] | 4 |
Binary size
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 0.3% | [0.1%, 0.5%] | 18 |
Regressions ❌ (secondary) | 0.9% | [0.0%, 1.5%] | 12 |
Improvements ✅ (primary) | -0.7% | [-2.5%, -0.1%] | 70 |
Improvements ✅ (secondary) | -0.6% | [-2.4%, -0.1%] | 4 |
All ❌✅ (primary) | -0.5% | [-2.5%, 0.5%] | 88 |
Bootstrap: 647.204s -> 645.087s (-0.33%)
Artifact size: 310.08 MiB -> 309.96 MiB (-0.04%)
celinval pushed a commit to model-checking/kani that referenced this pull request
Relevant upstream changes:
rust-lang/rust#120675: An intrinsic Symbol
is
now wrapped in a IntrinsicDef
struct, so the relevant part of the code
needed to be updated.
rust-lang/rust#121464: The second argument of
the create_wrapper_file
function changed from a vector to a string.
rust-lang/rust#121662: NullOp::DebugAssertions
was renamed to NullOp::UbCheck
and it now has data (currently unused
by Kani)
rust-lang/rust#121728: Introduces F16
and
F128
, so needed to add stubs for them
rust-lang/rust#121969: parse_sess
was renamed
to psess
, so updated the relevant code.
rust-lang/rust#122059: The
is_val_statically_known
intrinsic is now used in some core::fmt
code, so had to handle it in (codegen'ed to false).
rust-lang/rust#122233: This added a new
retag_box_to_raw
intrinsic. This is an operation that is primarily
relevant for stacked borrows. For Kani, we just return the pointer.
Resolves #3057
Visiting for weekly rustc-perf triage
- primary regressions are to cranelift-codegen opt-full (1.74%), cargo opt-full (1.33%), clap opt-full (1.18%), and exa debug-incr-unchanged (0.28%).
- the rust-timer run was "only" expected to regress 5 benchmarks, largely a different set, by a mean of 0.5%, not the 1.1% reported above. (I do not know what to take away from that observation, especially the fact that it a different set of benchmarks was impacted...)
- not marking as triaged.
The changes above are chaotic, small, about-same-magnitude changes, and almost all benchmarks that were changed are opt
builds. To me, that means that they are primarily if not entirely due to perturbing the MIR inliner's decisions.
Perhaps that's not the usual kind of perf report you're used to seeing in perf triage. I think most other kinds of changes produce perf reports where it is educational to look into each benchmark, because the fact that one crate was regressed or not means that there is a sort of code pattern that is now pessimized or handled better. I have certainly done such careful case-by-case analysis myself on many PRs and found it helpful. I firmly believe that such analysis is not useful in these perf reports that are primarily MIR inliner chaos (not noise!!). We could make rather subtle tweaks to the relevant standard library code, and completely change the set of benchmarks that get better or worse.
And in addition, for PRs like this where the perf reports are MIR inliner chaos, one should not expect the pre-merge perf report to look the same as the post-merge one. PRs that land between those try builds will provide the perturbations to cause a different chaotic outcome.
The exact same logic applies to #122282. It contains some regressions. That PR is good because it improves our ability to optimize MIR, and not because "improvements outweigh regressions". That suggests that the regressions in such PRs could be recouped by sufficient re-engineering of that change, but I believe they cannot because they reflect imprecision in the MIR inliner's cost model, as opposed to anything about the changes in the PR.
The same logic applies to any improvements that are reported in a PR that adds code to the standard library; such improvements reflect that the MIR inliner previously made wrong decisions and now happens (by sheer luck) to in a few cases make better ones.
If anyone wants to try to do case-by-case analysis of what changed in the generated code or analyze the MIR inliner's decisions they are welcome to attempt it. I will not be doing so. We already know what the next step is to improve the compile-time overhead of these checks, and that is #122282 as @dianqk indicated.
Binary size results looking good though, which is an expected result of this change
Binary size results looking good though, which is an expected result of this change
No. I think you're making exactly the mistake I tried to warn about above:
The same logic applies to any improvements that are reported in a PR that adds code to the standard library; such improvements reflect that the MIR inliner previously made wrong decisions and now happens (by sheer luck) to in a few cases make better ones.
In all cases the relevant code is behind an if false
to an opt
build, so LLVM should be just as capable of optimizing these checks before as it is now. I strongly suspect that what we've done is just nudge the MIR inliner to inline less, and in terms of binary size, less inlining tends to be a win.
I do not understand exactly why the MIR inliner tends to do inlinings that LLVM doesn't, but I've done a fair bit of analysis in that area before and it does. For some reason, a lot of drop-related symbols appear or disappear from binaries when inlining is tweaked.
Thanks for the explanations @saethlin , I really appreciate it, and I totally buy the explanation you have provided here.
These are really good thoughts. I am wondering if there's any place less transient we could put them, e.g. in some sort of guide/document that would help people doing the rustc-perf triage with the kinds of things they should be considering.
@rustbot label: +perf-regression-triaged
Not just perf triage, we should have a manual for doing perf report analysis in general.
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request
…=RalfJung
Distinguish between library and lang UB in assert_unsafe_precondition
As described in rust-lang#121583 (comment), assert_unsafe_precondition
now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).
debug_assert_nounwind
was originally added to avoid the "only at runtime" aspect of assert_unsafe_precondition
. Since then the difference between the macros has gotten muddied. This totally revamps the situation.
Now all preconditions shall be checked with assert_unsafe_precondition
. If you have a precondition that's only checkable at runtime, do a const_eval_select
hack, as done in this PR.
r? RalfJung
matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this pull request
…=RalfJung
Distinguish between library and lang UB in assert_unsafe_precondition
As described in rust-lang#121583 (comment), assert_unsafe_precondition
now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).
debug_assert_nounwind
was originally added to avoid the "only at runtime" aspect of assert_unsafe_precondition
. Since then the difference between the macros has gotten muddied. This totally revamps the situation.
Now all preconditions shall be checked with assert_unsafe_precondition
. If you have a precondition that's only checkable at runtime, do a const_eval_select
hack, as done in this PR.
r? RalfJung
CAD97 mentioned this pull request
I think I stumbled onto why this is causing "chaotic, small, about-same-magnitude changes" by "perturbing the MIR inliner's decisions" in #123174: the newly introduced branch is blocking MIR inlining of #[inline]
functions like Vec::deref
, which without the branch would otherwise be a single straightline basic block and obvious candidates for inlining.
Labels
This PR was explicitly merged by bors.
Performance regression.
The performance regression has been triaged.
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Relevant to the compiler team, which will review and decide on the PR/issue.
Relevant to the library team, which will review and decide on the PR/issue.