Enable ThinLTO for rustc on x86_64-apple-darwin
by lqd · Pull Request #103647 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation17 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Member
lqd commented
• Loading
Local measurements seemed to show an improvement on a couple benchmarks, so I'd like to test real CI builds, and see if the builder doesn't timeout with the expected slight increase in build times.
Let's start with x64 rustc ThinLTO, and then figure out the file structure to configure LLVM ThinLTO. Maybe we'll then try aarch64
builds since that also looked good locally.
rustbot added A-testsuite
Area: The testsuite used to check the correctness of rustc
Status: Awaiting review from the assignee but also interested parties.
Relevant to the infrastructure team, which will review and decide on the PR/issue.
labels
⌛ Trying commit 39bf1507ecf95e4f5b066de65cf9929a7650e1c3 with merge b4d23f6f6fd8602735600f2aa54e884005c277e8...
☀️ Try build successful - checks-actions
Build commit: b4d23f6f6fd8602735600f2aa54e884005c277e8 (b4d23f6f6fd8602735600f2aa54e884005c277e8
)
The job completed in 2h57 which is maybe too close to a timeout ? Looking at the first couple pages of successful CI jobs, the range was from 2h16 to 2h45.
(I've removed the CI hacks)
The only x86 darwin system I have access to is pretty noisy (old iMac), making the majority of the rustc-perf
benchmarks to be avoided.
So I've tried the longer benchmarks, on full clean builds (not only the leaf crate like the perf collector does) it seems that on this machine the ThinLTOed rustc b4d23f6f6fd8602735600f2aa54e884005c277e8
is generally faster than the baseline 0da281b6068a7d889ae89a9bd8991284cc9b7535
by 1-2%. (Sometimes more, e.g. diesel-1.4.8
, image-0.24.1
or keccak
on check builds).
Two things would be interesting to know:
- whether CI can afford this change on the macOS builder ?
- whether such changes to rustc/llvm on the x86 or aarch64 builders, should wait for us to add mac support to
rustc-perf
, or if a benchmark on a less noisy environment would be enough, assuming the results were positive there ? I've wanted to ask @thomcc's expertise and guidance on how we should support macs inrustc-perf
.
For the two questions above, I'll r? @Mark-Simulacrum to have your thoughts if that's OK.
lqd marked this pull request as ready for review
lqd changed the title
[perf] Enable ThinLTO for rustc on Enable ThinLTO for rustc on x86_64-apple-darwin
x86_64-apple-darwin
lqd mentioned this pull request
30 tasks
I'm not too worried about claiming this is a win for Macs if it was a win for Linux; that seems like a pretty reasonable claim in my eyes. Especially if we can have some validation on noisier machines.
That said I think making this switch will probably bring too high a cost at this time, given that these runners dominate our runtimes. It's a little bit annoying in the sense that making the change for just beta or similar might drive down our runtimes on nightly and give more room for this kind of optimization... I think if we don't see headroom through other ways (e.g., different CI runners, optimizations to the things running in them, etc) then it might be worth trying that. It feels relatively safe as a beta-exclusive (i.e. not on nightly or stable), since beta sees very little usage anyway.
That makes sense, thanks a bunch.
I'll mark this blocked for now rather than close it outright (but feel free to do so if you prefer): so that we can revisit in the future and decide which of the two paths to take then.
Status: Blocked on something else such as an RFC or other implementation work.
and removed S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
labels
With the new osx builders from #105212, I believe we should now have the capacity to enable this change: @rustbot label -S-blocked
What do you think @Mark-Simulacrum, can we land this ?
@rustbot ready
Status: Blocked on something else such as an RFC or other implementation work.
label
Let's merge this, mostly so that we can get some data gathered on what the hit is. We'll reconsider if it is a significant increase that begins to drive the slowest builds again. For the record:
So looks like we're around 1 hour per apple dist builder right now.
@bors r+
📌 Commit 3a085f7 has been approved by Mark-Simulacrum
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
⌛ Testing commit 3a085f7 with merge 32d70cc436f06f18c510b2e964db6f7c21b5bbf3...
bors added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
labels
@bors retry
Failed to update toolstate
(has been fixed)
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
This comment was marked as resolved.
Finished benchmarking commit (bdb07a8): comparison URL.
Overall result: no relevant changes - no action needed
@rustbot label: -perf-regression
Instruction count
This benchmark run did not return any relevant results for this metric.
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) | - | - | 0 |
Regressions ❌ (secondary) | 3.0% | [3.0%, 3.0%] | 1 |
Improvements ✅ (primary) | -0.9% | [-0.9%, -0.9%] | 1 |
Improvements ✅ (secondary) | - | - | 0 |
All ❌✅ (primary) | -0.9% | [-0.9%, -0.9%] | 1 |
Cycles
This benchmark run did not return any relevant results for this metric.
lqd deleted the osx-x64-lto branch
lqd mentioned this pull request
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request
Revert "enable ThinLTO for rustc on x86_64-apple-darwin dist builds"
Apparently ThinLTO on x64 mac can regress some of the ICEs' output. This reverts rust-lang#103647 to allow for investigation, and fixes rust-lang#105637 in the meantime.
bors added a commit to rust-lang-ci/rust that referenced this pull request
Revert "enable ThinLTO for rustc on x86_64-apple-darwin dist builds"
Apparently ThinLTO on x64 mac can regress some of the ICEs' output. This reverts rust-lang#103647 to allow for investigation, and helps with rust-lang#105637 in the meantime.
lqd mentioned this pull request
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request
Enable ThinLTO for rustc on x86_64-apple-darwin
Local measurements seemed to show an improvement on a couple benchmarks, so I'd like to test real CI builds, and see if the builder doesn't timeout with the expected slight increase in build times.
Let's start with x64 rustc ThinLTO, and then figure out the file structure to configure LLVM ThinLTO. Maybe we'll then try aarch64
builds since that also looked good locally.
wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request
Pkgsrc changes:
- Adjust patches (add & remove) and cargo checksums to new versions.
- It's conceivable that the workaround for LLVM based NetBSD works even less in this version (ref. PKGSRC_HAVE_LIBCPP not having a corresponding patch anymore).
Upstream changes:
Version 1.68.2 (2023-03-28)
- [Update the GitHub RSA host key bundled within Cargo] (rust-lang/cargo#11883). The key was [rotated by GitHub] (https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/) on 2023-03-24 after the old one leaked.
- [Mark the old GitHub RSA host key as revoked] (rust-lang/cargo#11889). This will prevent Cargo from accepting the leaked key even when trusted by the system.
- [Add support for
@revoked
and a better error message for@cert-authority
in Cargo's SSH host key verification] (rust-lang/cargo#11635)
Version 1.68.1 (2023-03-23)
- [Fix miscompilation in produced Windows MSVC artifacts] (rust-lang/rust#109094) This was introduced by enabling ThinLTO for the distributed rustc which led to miscompilations in the resulting binary. Currently this is believed to be limited to the -Zdylib-lto flag used for rustc compilation, rather than a general bug in ThinLTO, so only rustc artifacts should be affected.
- [Fix --enable-local-rust builds] (rust-lang/rust#109111)
- [Treat
$prefix-clang
asclang
in linker detection code] (rust-lang/rust#109156) - [Fix panic in compiler code] (rust-lang/rust#108162)
Version 1.68.0 (2023-03-09)
Language
- [Stabilize default_alloc_error_handler]
(rust-lang/rust#102318)
This allows usage of
alloc
on stable without requiring the definition of a handler for allocation failure. Defining custom handlers is still unstable. - [Stabilize
efiapi
calling convention.] (rust-lang/rust#105795) - [Remove implicit promotion for types with drop glue] (rust-lang/rust#105085)
Compiler
- [Change
bindings_with_variant_name
to deny-by-default] (rust-lang/rust#104154) - [Allow .. to be parsed as let initializer] (rust-lang/rust#105701)
- [Add
armv7-sony-vita-newlibeabihf
as a tier 3 target] (rust-lang/rust#105712) - [Always check alignment during compile-time const evaluation] (rust-lang/rust#104616)
- [Disable "split dwarf inlining" by default.] (rust-lang/rust#106709)
- [Add vendor to Fuchsia's target triple] (rust-lang/rust#106429)
- [Enable sanitizers for s390x-linux] (rust-lang/rust#107127)
Libraries
- [Loosen the bound on the Debug implementation of Weak.] (rust-lang/rust#90291)
- [Make
std::task::Context
!Send and !Sync] (rust-lang/rust#95985) - [PhantomData layout guarantees] (rust-lang/rust#104081)
- [Don't derive Debug for
OnceWith
&RepeatWith
] (rust-lang/rust#104163) - [Implement DerefMut for PathBuf] (rust-lang/rust#105018)
- [Add O(1)
Vec -> VecDeque
conversion guarantee] (rust-lang/rust#105128) - [Leak amplification for peek_mut() to ensure BinaryHeap's invariant is always met] (rust-lang/rust#105851)
Stabilized APIs
- [
{core,std}::pin::pin!
] (https://doc.rust-lang.org/stable/std/pin/macro.pin.html) - [
impl From<bool> for {f32,f64}
] (https://doc.rust-lang.org/stable/std/primitive.f32.html#impl-From%3Cbool%3E-for-f32) - [
std::path::MAIN_SEPARATOR_STR
] (https://doc.rust-lang.org/stable/std/path/constant.MAIN_SEPARATOR_STR.html) - [
impl DerefMut for PathBuf
] (https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#impl-DerefMut-for-PathBuf)
These APIs are now stable in const contexts:
Cargo
- [Stabilize sparse registry support for crates.io] (rust-lang/cargo#11224)
- [
cargo build --verbose
tells you more about why it recompiles.] (rust-lang/cargo#11407) - [Show progress of crates.io index update even
net.git-fetch-with-cli
option enabled] (rust-lang/cargo#11579)
Misc
Compatibility Notes
- [Add
SEMICOLON_IN_EXPRESSIONS_FROM_MACROS
to future-incompat report] (rust-lang/rust#103418) - [Only specify
--target
by default for-Zgcc-ld=lld
on wasm] (rust-lang/rust#101792) - [Bump
IMPLIED_BOUNDS_ENTAILMENT
to Deny + ReportNow] (rust-lang/rust#106465) - [
std::task::Context
no longer implements Send and Sync] (rust-lang/rust#95985)
nternal Changes
These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools.
- [Encode spans relative to the enclosing item] (rust-lang/rust#84762)
- [Don't normalize in AstConv] (rust-lang/rust#101947)
- [Find the right lower bound region in the scenario of partial order relations] (rust-lang/rust#104765)
- [Fix impl block in const expr] (rust-lang/rust#104889)
- [Check ADT fields for copy implementations considering regions] (rust-lang/rust#105102)
- [rustdoc: simplify JS search routine by not messing with lev distance] (rust-lang/rust#105796)
- [Enable ThinLTO for rustc on
x86_64-pc-windows-msvc
] (rust-lang/rust#103591) - [Enable ThinLTO for rustc on
x86_64-apple-darwin
] (rust-lang/rust#103647)
Version 1.67.0 (2023-01-26)
Language
- [Make
Sized
predicates coinductive, allowing cycles.] (rust-lang/rust#100386) - [
#[must_use]
annotations onasync fn
also affect theFuture::Output
.] (rust-lang/rust#100633) - [Elaborate supertrait obligations when deducing closure signatures.] (rust-lang/rust#101834)
- [Invalid literals are no longer an error under
cfg(FALSE)
.] (rust-lang/rust#102944) - [Unreserve braced enum variants in value namespace.] (rust-lang/rust#103578)
Compiler
- [Enable varargs support for calling conventions other than
C
orcdecl
.] (rust-lang/rust#97971) - [Add new MIR constant propagation based on dataflow analysis.] (rust-lang/rust#101168)
- [Optimize field ordering by grouping m*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750)
- [Stabilize native library modifier
verbatim
.] (rust-lang/rust#104360)
Added and removed targets:
- [Add a tier 3 target for PowerPC on AIX]
(rust-lang/rust#102293),
powerpc64-ibm-aix
. - [Add a tier 3 target for the Sony PlayStation 1]
(rust-lang/rust#102689),
mipsel-sony-psx
. - [Add tier 3
no_std
targets for the QNX Neutrino RTOS] (rust-lang/rust#102701),aarch64-unknown-nto-qnx710
andx86_64-pc-nto-qnx710
. - [Remove tier 3
linuxkernel
targets] (rust-lang/rust#104015) (not used by the actual kernel).
Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support.
Libraries
- [Merge
crossbeam-channel
intostd::sync::mpsc
.] (rust-lang/rust#93563) - [Fix inconsistent rounding of 0.5 when formatted to 0 decimal places.] (rust-lang/rust#102935)
- [Derive
Eq
andHash
forControlFlow
.] (rust-lang/rust#103084) - [Don't build
compiler_builtins
with-C panic=abort
.] (rust-lang/rust#103786)
Stabilized APIs
- [
{integer}::checked_ilog
] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog) - [
{integer}::checked_ilog2
] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog2) - [
{integer}::checked_ilog10
] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog10) - [
{integer}::ilog
] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog) - [
{integer}::ilog2
] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog2) - [
{integer}::ilog10
] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog10) - [
NonZeroU*::ilog2
] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog2) - [
NonZeroU*::ilog10
] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog10) - [
NonZero*::BITS
] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#associatedconstant.BITS)
These APIs are now stable in const contexts:
- [
char::from_u32
] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_u32) - [
char::from_digit
] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_digit) - [
char::to_digit
] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_digit) - [
core::char::from_u32
] (https://doc.rust-lang.org/stable/core/char/fn.from_u32.html) - [
core::char::from_digit
] (https://doc.rust-lang.org/stable/core/char/fn.from_digit.html)
Compatibility Notes
- [The layout of
repr(Rust)
types now groups m*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750) This is intended to be an optimization, but it is also known to increase type sizes in a few cases for the placement of enum tags. As a reminder, the layout ofrepr(Rust)
types is an implementation detail, subject to change. - [0.5 now rounds to 0 when formatted to 0 decimal places.] (rust-lang/rust#102935) This makes it consistent with the rest of floating point formatting that rounds ties toward even digits.
- [Chains of
&&
and||
will now drop temporaries from their sub-expressions in evaluation order, left-to-right.] (rust-lang/rust#103293) Previously, it was "twisted" such that the first expression dropped its temporaries last, after all of the other expressions dropped in order. - [Underscore suffixes on string literals are now a hard error.] (rust-lang/rust#103914) This has been a future-compatibility warning since 1.20.0.
- [Stop passing
-export-dynamic
towasm-ld
.] (rust-lang/rust#105405) - [
main
is now mangled as__main_void
onwasm32-wasi
.] (rust-lang/rust#105468) - [Cargo now emits an error if there are multiple registries in the configuration with the same index URL.] (rust-lang/cargo#10592)
Internal Changes
These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools.
- [Rewrite LLVM's archive writer in Rust.] (rust-lang/rust#97485)
Labels
Area: The testsuite used to check the correctness of rustc
This PR was explicitly merged by bors.
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Relevant to the infrastructure team, which will review and decide on the PR/issue.