Build rustc
with a single CGU on x64 Linux by Kobzol · Pull Request #115554 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation77 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This PR adds the rust.codegen-units=1
setting when compiling the 64-bit Linux rustc
artifact (the one used for try builds and Linux rustup distribution). This had mixed results in the past, however after the bump to LLVM 17, the results now seem pretty incredible. Instruction counts, cycles, wall time, max RSS and even artifact sizes see large improvements.
The last try build with this setting took 1h 8m, which is basically the same duration for try builds that we have seen recently. So there shouldn't be any large hit to CI/build time.
I hope that this could potentially also reduce codegen noise of rustc
a little bit, since small changes within a single rustc
crate should no longer perturb optimizations because of CGU movement. We still do cross-crate LTO, so it won't eliminate it though.
rustbot added A-testsuite
Area: The testsuite used to check the correctness of rustc
Status: Awaiting review from the assignee but also interested parties.
Relevant to the infrastructure team, which will review and decide on the PR/issue.
labels
This comment has been minimized.
⌛ Trying commit 1bffa88 with merge 2d8544092e7b47541698b5a6425f9f1aa131b994...
Why only on the Linux builder? This should be done on every builder.
Sure, but we don't have to do that in a single step. We can easily gauge the CI build time and compile performance effects of this on Linux, thanks to try builds and perf.RLO, but it's not that easy for other targets. It's possible that on Windows/macOS this might be a hit to CI times, so we should first investigate that (along with checking how it affects their performance, although that should hopefully be an improvement).
It's also possible that this might cause some issues for nightly users, so I would like to test it for a few weeks on nightly on Linux first, before we enable it globally across the board.
It's possible that on Windows/macOS this might be a hit to CI times, so we should first investigate that (along with checking how it affects their performance, although that should hopefully be an improvement)
Are we expecting LVM17 to be different from our previous measurements of CI times and performance?
Well, some things might have changed in the meantime. Apple CI is now much faster (so that's a good thing). And performance has changed a lot on Linux (but again, towards improvements, so hopefully this will be true on other platforms too). It looks like things are improving in general, but I'd still like to do this on Linux first to see if there are no unexpected issues with 1 CGU.
☀️ Try build successful - checks-actions
Build commit: 2d8544092e7b47541698b5a6425f9f1aa131b994 (2d8544092e7b47541698b5a6425f9f1aa131b994
)
This comment has been minimized.
I'm only talking about the comment about other builders, not this PR or the landing strategy: do we need to re-run the 1 CGU PRs on the other builders to gather up-to-date timings on LLVM17 and the new apple builder?
I think that it would be good, yes. It also depends how we want to enable it for other builders. We can allowlist it for Windows and macOS and test these two separately, or just enable it across the board for everything and then take a look at the total CI time.
Finished benchmarking commit (2d8544092e7b47541698b5a6425f9f1aa131b994): comparison URL.
Overall result: ❌✅ regressions and improvements - ACTION NEEDED
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.
@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 0.8% | [0.3%, 2.2%] | 36 |
Regressions ❌ (secondary) | 1.4% | [0.3%, 2.4%] | 21 |
Improvements ✅ (primary) | -1.2% | [-4.1%, -0.3%] | 89 |
Improvements ✅ (secondary) | -1.7% | [-3.6%, -0.4%] | 114 |
All ❌✅ (primary) | -0.6% | [-4.1%, 2.2%] | 125 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | - | - | 0 |
Regressions ❌ (secondary) | - | - | 0 |
Improvements ✅ (primary) | -2.1% | [-4.9%, -0.6%] | 74 |
Improvements ✅ (secondary) | -3.0% | [-7.6%, -0.8%] | 109 |
All ❌✅ (primary) | -2.1% | [-4.9%, -0.6%] | 74 |
Cycles
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | - | - | 0 |
Regressions ❌ (secondary) | 2.7% | [2.4%, 3.1%] | 4 |
Improvements ✅ (primary) | -2.6% | [-6.9%, -0.9%] | 101 |
Improvements ✅ (secondary) | -4.1% | [-15.4%, -0.9%] | 121 |
All ❌✅ (primary) | -2.6% | [-6.9%, -0.9%] | 101 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 627.535s -> 623.723s (-0.61%)
Artifact size: 316.30 MiB -> 271.45 MiB (-14.18%)
📌 Commit 1bffa88 has been approved by Mark-Simulacrum
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
bors added a commit to rust-lang-ci/rust that referenced this pull request
Build rustc
with a single CGU on x64 Linux
This PR adds the rust.codegen-units=1
setting when compiling the 64-bit Linux rustc
artifact (the one used for try builds and Linux rustup distribution). This had mixed results in the past, however after the bump to LLVM 17, the results now seem pretty [incredible](rust-lang#115554 (comment)). Instruction counts, cycles, wall time, max RSS and even artifact sizes see large improvements.
The last try build with this setting took 1h 8m, which is basically the same duration for try builds that we have seen recently. So there shouldn't be any large hit to CI/build time.
I hope that this could potentially also reduce codegen noise of rustc
a little bit, since small changes within a single rustc
crate should no longer perturn optimizations because of CGU movement. We still do cross-crate LTO, so it won't eliminate it though.
r? @Mark-Simulacrum
But at the same time, libstd.so size didn't changed at all, strange.
This comment has been minimized.
📌 Commit ca59652 has been approved by Mark-Simulacrum
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
labels
lqd mentioned this pull request
Finished benchmarking commit (871407a): comparison URL.
Overall result: ❌✅ regressions and improvements - ACTION NEEDED
Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged
label to this PR.
@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | 0.8% | [0.3%, 2.6%] | 39 |
Regressions ❌ (secondary) | 1.6% | [0.7%, 2.5%] | 23 |
Improvements ✅ (primary) | -1.2% | [-3.9%, -0.2%] | 89 |
Improvements ✅ (secondary) | -1.6% | [-3.5%, -0.4%] | 124 |
All ❌✅ (primary) | -0.6% | [-3.9%, 2.6%] | 128 |
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | - | - | 0 |
Regressions ❌ (secondary) | 2.9% | [2.9%, 2.9%] | 1 |
Improvements ✅ (primary) | -2.5% | [-4.6%, -0.5%] | 90 |
Improvements ✅ (secondary) | -3.0% | [-6.7%, -0.6%] | 118 |
All ❌✅ (primary) | -2.5% | [-4.6%, -0.5%] | 90 |
Cycles
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | - | - | 0 |
Regressions ❌ (secondary) | 6.0% | [4.7%, 7.3%] | 6 |
Improvements ✅ (primary) | -2.2% | [-5.9%, -0.6%] | 134 |
Improvements ✅ (secondary) | -3.7% | [-13.9%, -1.0%] | 140 |
All ❌✅ (primary) | -2.2% | [-5.9%, -0.6%] | 134 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 630.664s -> 627.128s (-0.56%)
Artifact size: 317.36 MiB -> 273.31 MiB (-13.88%)
Cycle count, Max RSS and artifact size is overwhelmingly positive.
@rustbot label: +perf-regression-triaged
Summary of perf results:
- icounts: -0.49% mean change across all measurements
- cycles: -1.91% (ditto)
- wall-time: -1.57% (ditto)
- max-rss: -1.96% (ditto)
- librustc_driver size: -31.6%
- rustdoc artifact size: -20.7%
- cargo artifact size: -19.9%
Fantastic results!
The bootstrap time also went down a tiny bit (-0.56%) but that is misleading. The bootstrap benchmark doesn't use the same configuration as the shipped compiler: no codegen-units=1
and no PGO/BOLT/LTO. @Kobzol told me that try build times have gone from something like 1h8m to 1h14m.
This was referenced
Oct 1, 2023
pvdrz mentioned this pull request
bors-ferrocene bot added a commit to ferrocene/ferrocene that referenced this pull request
RalfJung pushed a commit to RalfJung/miri that referenced this pull request
Build rustc
with a single CGU on x64 Linux
This PR adds the rust.codegen-units=1
setting when compiling the 64-bit Linux rustc
artifact (the one used for try builds and Linux rustup distribution). This had mixed results in the past, however after the bump to LLVM 17, the results now seem pretty [incredible](rust-lang/rust#115554 (comment)). Instruction counts, cycles, wall time, max RSS and even artifact sizes see large improvements.
The last try build with this setting took 1h 8m, which is basically the same duration for try builds that we have seen recently. So there shouldn't be any large hit to CI/build time.
I hope that this could potentially also reduce codegen noise of rustc
a little bit, since small changes within a single rustc
crate should no longer perturb optimizations because of CGU movement. We still do cross-crate LTO, so it won't eliminate it though.
r? @Mark-Simulacrum
bors-ferrocene bot added a commit to ferrocene/ferrocene that referenced this pull request
bors-ferrocene bot added a commit to ferrocene/ferrocene that referenced this pull request
bors-ferrocene bot added a commit to ferrocene/ferrocene that referenced this pull request
bors added a commit to rust-lang-ci/rust that referenced this pull request
Optimize librustc_driver.so
with BOLT
This PR optimizes librustc_driver.so
on 64-bit Linux CI with BOLT.
Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
- Only pass it when we actually plan to use BOLT. How to best do that?
config.toml
entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done byopt-dist
, therefore bootstrap doesn't know about it by default. - Only pass it to
librustc_driver.so
(see performance below).
Some discussion of this flag already happened on Zulip.
Performance
Latest perf. results can be found [here](rust-lang#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?).
Summary:
- ✔️
-1.8%
mean improvement in cycle counts across many primary benchmarks. - ✔️
-1.8%
mean Max-RSS improvement. - ✖️ 34 MiB (+48%) artifact size regression of
librustc_driver.so
.- This is caused by building
librustc_driver.so
with relocations (which are required for BOLT). Hopefully, it will be fixed in the future with BOLT improvements, but now trying to reduce this size increase is tricky. - Note that the size of this file was recently reduced in rust-lang#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
- This is caused by building
- ✖️ 1.4 MiB (+53%) artifact size regression of
rustc
.- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
librustc_driver.so
(where they are needed) and forrustc
(where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstraprustc
shim to only apply the relocation flag for the shared library and not forrustc
.
- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and librustc_driver
BOLT profile gathering at the same time (now they are gathered separately for LLVM and librustc_driver
).
r? @Mark-Simulacrum
Also CC @onur-ozkan,
primarily for the bootstrap linker flag issue.
bors added a commit to rust-lang-ci/rust that referenced this pull request
Optimize librustc_driver.so
with BOLT
This PR optimizes librustc_driver.so
on 64-bit Linux CI with BOLT.
Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
- Only pass it when we actually plan to use BOLT. How to best do that?
config.toml
entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done byopt-dist
, therefore bootstrap doesn't know about it by default. - Only pass it to
librustc_driver.so
(see performance below).
Some discussion of this flag already happened on Zulip.
Performance
Latest perf. results can be found [here](rust-lang#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?).
Summary:
- ✔️
-1.8%
mean improvement in cycle counts across many primary benchmarks. - ✔️
-1.8%
mean Max-RSS improvement. - ✖️ 34 MiB (+48%) artifact size regression of
librustc_driver.so
.- This is caused by building
librustc_driver.so
with relocations (which are required for BOLT). Hopefully, it will be fixed in the future with BOLT improvements, but now trying to reduce this size increase is tricky. - Note that the size of this file was recently reduced in rust-lang#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
- This is caused by building
- ✖️ 1.4 MiB (+53%) artifact size regression of
rustc
.- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
librustc_driver.so
(where they are needed) and forrustc
(where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstraprustc
shim to only apply the relocation flag for the shared library and not forrustc
.
- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and librustc_driver
BOLT profile gathering at the same time (now they are gathered separately for LLVM and librustc_driver
).
r? @Mark-Simulacrum
Also CC @onur-ozkan,
primarily for the bootstrap linker flag issue.
bors added a commit to rust-lang-ci/rust that referenced this pull request
…ulacrum
Optimize librustc_driver.so
with BOLT
This PR optimizes librustc_driver.so
on 64-bit Linux CI with BOLT.
Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
- Only pass it when we actually plan to use BOLT. How to best do that?
config.toml
entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done byopt-dist
, therefore bootstrap doesn't know about it by default. - Only pass it to
librustc_driver.so
(see performance below).
Some discussion of this flag already happened on Zulip.
Performance
Latest perf. results can be found [here](rust-lang#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?).
Summary:
- ✔️
-1.8%
mean improvement in cycle counts across many primary benchmarks. - ✔️
-1.8%
mean Max-RSS improvement. - ✖️ 34 MiB (+48%) artifact size regression of
librustc_driver.so
.- This is caused by building
librustc_driver.so
with relocations (which are required for BOLT). Hopefully, it will be fixed in the future with BOLT improvements, but now trying to reduce this size increase is tricky. - Note that the size of this file was recently reduced in rust-lang#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
- This is caused by building
- ✖️ 1.4 MiB (+53%) artifact size regression of
rustc
.- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
librustc_driver.so
(where they are needed) and forrustc
(where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstraprustc
shim to only apply the relocation flag for the shared library and not forrustc
.
- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and librustc_driver
BOLT profile gathering at the same time (now they are gathered separately for LLVM and librustc_driver
).
r? @Mark-Simulacrum
Also CC @onur-ozkan,
primarily for the bootstrap linker flag issue.
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request
Optimize librustc_driver.so
with BOLT
This PR optimizes librustc_driver.so
on 64-bit Linux CI with BOLT.
Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
- Only pass it when we actually plan to use BOLT. How to best do that?
config.toml
entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done byopt-dist
, therefore bootstrap doesn't know about it by default. - Only pass it to
librustc_driver.so
(see performance below).
Some discussion of this flag already happened on Zulip.
Performance
Latest perf. results can be found [here](rust-lang/rust#102487 (comment)). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?).
Summary:
- ✔️
-1.8%
mean improvement in cycle counts across many primary benchmarks. - ✔️
-1.8%
mean Max-RSS improvement. - ✖️ 34 MiB (+48%) artifact size regression of
librustc_driver.so
.- This is caused by building
librustc_driver.so
with relocations (which are required for BOLT). Hopefully, it will be fixed in the future with BOLT improvements, but now trying to reduce this size increase is tricky. - Note that the size of this file was recently reduced in rust-lang/rust#115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
- This is caused by building
- ✖️ 1.4 MiB (+53%) artifact size regression of
rustc
.- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
librustc_driver.so
(where they are needed) and forrustc
(where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstraprustc
shim to only apply the relocation flag for the shared library and not forrustc
.
- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to
CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and librustc_driver
BOLT profile gathering at the same time (now they are gathered separately for LLVM and librustc_driver
).
r? @Mark-Simulacrum
Also CC @onur-ozkan,
primarily for the bootstrap linker flag issue.
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request
Refactor opt-dist
to simplify local building
This PR refactors the opt-dist
tool to make it easier to invoke it locally, outside of CI, and thus simplify building PGO/BOLT optimized rustc
builds e.g. for distro maintainers. It should also make it easier to run the PGO/BOLT workflow locally e.g. to profile performance or debug issues (looking at you, rust-lang/rust#115554).
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request
Refactor opt-dist
to simplify local building
This PR refactors the opt-dist
tool to make it easier to invoke it locally, outside of CI, and thus simplify building PGO/BOLT optimized rustc
builds e.g. for distro maintainers. It should also make it easier to run the PGO/BOLT workflow locally e.g. to profile performance or debug issues (looking at you, rust-lang/rust#115554).
Labels
Area: The testsuite used to check the correctness of rustc
This PR was explicitly merged by bors.
Performance regression.
The performance regression has been triaged.
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Relevant to the infrastructure team, which will review and decide on the PR/issue.