Avoid lowering code under dead SwitchInt targets by saethlin · Pull Request #121421 · rust-lang/rust (original) (raw)

The objective of this PR is to detect and eliminate code which is guarded by an if false, even if that false is a constant which is not known until monomorphization, or is intrinsics::debug_assertions().

The effect of this is that we generate no LLVM IR the standard library's unsafe preconditions, when they are compiled in a build where they should be immediately optimized out. This mono-time optimization ensures that builds which disable debug assertions do not grow a linkage requirement against core, which compiler-builtins currently needs: #121552

This revives the codegen side of #91222 as planned in #120848.

rustbot added S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

T-compiler

Relevant to the compiler team, which will review and decide on the PR/issue.

labels

Feb 22, 2024

This comment has been minimized.

bors added a commit to rust-lang-ci/rust that referenced this pull request

Feb 22, 2024

This comment has been minimized.

bors added the S-waiting-on-author

Status: This is awaiting some action (such as code changes or more information) from the author.

label

Feb 22, 2024

This comment has been minimized.

bors added a commit to rust-lang-ci/rust that referenced this pull request

Feb 22, 2024

This comment has been minimized.

bors added a commit to rust-lang-ci/rust that referenced this pull request

Feb 22, 2024

This comment has been minimized.

☀️ Try build successful - checks-actions
Build commit: b183b7a (b183b7a248fcc8f8102bd2d19782bea9de6e3546)

I just checked to see if this PR optimizes out all the panic calls from a compiler-builtins build with debug assertions off and optimizations off, and it looks to me like since #121662 landed, it does not. It isn't entirely clear to me how to fix this. I'll put up a PR in a bit with some suggestions, but I do not like them.

Finished benchmarking commit (5a6c1aa): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.8%	[-0.8%, -0.8%]	1
Improvements ✅ (secondary)	-0.4%	[-0.7%, -0.3%]	5
All ❌✅ (primary)	-0.8%	[-0.8%, -0.8%]	1

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	20
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.0%	[-0.0%, -0.0%]	20

Bootstrap: 674.552s -> 676.304s (0.26%)
Artifact size: 310.77 MiB -> 310.79 MiB (0.01%)

if reachable_blocks.contains(bb) {
fx.codegen_block(bb);
} else {
// This may have references to things we didn't monomorphize, so we

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have references to things we didn't monomorphize

How is that possible? The collector should ensure that we monomorphize everything referenced in the MIR, even in unreachable parts of the MIR.

However it could be interesting to make use of reachable_blocks_in_mono in the collector, and thus avoid collecting the functions in dead basic blocks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I didn't fully update the comment from scottmcm's PR that I stole this from

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw a comment was never sent.

let mut set = BitSet::new_empty(self.basic_blocks.len());
self.reachable_blocks_in_mono_from(tcx, instance, &mut set, START_BLOCK);
set
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to traversal.rs, and refactor the computation to use the standard worklist/visited set algo?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors added a commit to rust-lang-ci/rust that referenced this pull request

Mar 31, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 1, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 1, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 2, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 2, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 2, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 7, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 9, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request

Apr 26, 2025

Add post-mono MIR optimizations

Before this PR, all MIR passes had to operate on polymorphic MIR. Thus any MIR transform maybe unable to determine the type of an argument or local (because it's still generic) or it may be unable to determine which function a Call terminator is calling (because it's still generic).

MIR transforms are a highly maintainable solution to a number of compiler problems, but this polymorphic limitation means that they are cannot solve some of our problems that we'd like them to; the most recent examples that come to mind are rust-lang#134082 which has extra limitations because of the polymorphic inliner, and rust-lang#139088 which is explicitly waiting for post-mono MIR passes to happen.

In addition, the lack of post-mono MIR optimizations means that MIR optimizations just miss out on profitable optimizations, which are so valuable that we've added kludges like rust-lang#121421 (a MIR traversal that you better only run at mono-time).

In addition, rustc_codegen_ssa is riddled with on-the-fly monomorphization and optimization; the logic for these trick that we do in codegen in my experience are hard to maintain, and I would much rather have those implemented in a MIR transform.

So this PR adds a new query codegen_mir (the MIR for codegen, not that I like the name). I've then replaced some of the kludges in rustc_codegen_ssa with PostMono variants of existing MIR transforms.

I've also un-querified check_mono_item and put it at the end of the post-mono pass list. Those checks should be post-mono passes too, but I've tried to keep this PR to a reviewable size. It's easy to imagine lots of other places to use post-mono MIR opts and I want the usefulness of this to be clear while the diff is also manageable.

This PR has a perf regression. I've hammered on the perf in a number of ways to get it down to what it is. incr-full builds suffer the most because they need to clone, intern, and cache a monomorphized copy of every MIR body. Things are mixed for every other build scenario. In almost all cases, binary sizes improve.

bors added a commit to rust-lang-ci/rust that referenced this pull request

May 1, 2025

Add post-mono MIR optimizations

bors added a commit to rust-lang-ci/rust that referenced this pull request

May 1, 2025

Add post-mono MIR optimizations

bors added a commit to rust-lang-ci/rust that referenced this pull request

May 1, 2025

Add post-mono MIR optimizations

bors added a commit to rust-lang-ci/rust that referenced this pull request

May 3, 2025

Add post-mono MIR optimizations

In addition, rustc_codegen_ssa is riddled with on-the-fly monomorphization and optimization; the logic for these tricks that we do during codegen in my experience are hard to maintain, and I would much rather have those implemented in a MIR transform.