ci: Enable opt-dist for dist-aarch64-linux builds by mrkajetanp · Pull Request #133807 · rust-lang/rust (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation76 Commits1 Checks6 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

mrkajetanp

Enable optimised AArch64 dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol
cc @lqd

try-job: dist-aarch64-linux

@rustbot rustbot added A-testsuite

Area: The testsuite used to check the correctness of rustc

S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

T-infra

Relevant to the infrastructure team, which will review and decide on the PR/issue.

labels

Dec 3, 2024

@rustbot

Some changes occurred in src/tools/opt-dist

cc @Kobzol

@Kobzol

Hi! Could you please split the part that moves the job to the aarch64 runner and the PGO/LTO part? So that we can evaluate the CI cost of these two actions separately. Thanks!

mrkajetanp

Comment on lines +48 to +99

ENV SCRIPT python3 ../x.py build --set rust.debug=true opt-dist && \
./build/$HOSTS/stage0-tools-bin/opt-dist linux-ci -- python3 ../x.py dist \
--host HOSTS−−targetHOSTS --target HOSTStargetHOSTS --include-default-paths build-manifest bootstrap

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way this is a completely new dockerfile, so do you mean just replace this with a simple ./x dist call and then wrap it with opt-dist separately? Just in separate commits or separate PRs altogether?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a separate PR, so that we can land these two changes (move to aarch64 host first, and then enable optimizations) separately :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, just wanted to make sure - no problem :)

@lqd

What improvements are you seeing with this PR, over the current artifacts?

@mrkajetanp

I've not yet benchmarked the changes, and I'm not sure how they compare to the artifacts from cross-compilation because I was only doing aarch64 runs but specifically adding opt-dist with LTO and PGO seems to increase the binary sizes of the main artifacts as follows:

librustc_driver-8c547d00f2ea16d5.so - 84M -> 94M
libLLVM.so.19.1-rust-1.85.0-nightly - 106M -> 108M
rustc -> 2.7M for both, +100 bytes

@Kobzol

@bors try

Let's also see how long it takes with the optimizations.

bors added a commit to rust-lang-ci/rust that referenced this pull request

Dec 3, 2024

@bors

ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol cc @lqd

@bors

bors added a commit to rust-lang-ci/rust that referenced this pull request

Dec 3, 2024

@bors

ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol cc @lqd

@bors

@lqd

That’s not going to be the good try job Jakub :3

@bors

@bors bors added S-waiting-on-author

Status: This is awaiting some action (such as code changes or more information) from the author.

and removed S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

labels

Dec 3, 2024

@Kobzol

Ah, crap. Thanks!

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request

Dec 3, 2024

@bors

ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol cc @lqd

try-job: dist-aarch64-linux

@bors

@bors

☀️ Try build successful - checks-actions
Build commit: d156b32 (d156b32c73dd32916f0ca83fb0104e600fbad49d)

@mrkajetanp

So that's an extra hour for LTO+PGO without the cache. 2h22 vs 3h22.

@lqd

bors added a commit to rust-lang-ci/rust that referenced this pull request

Dec 4, 2024

@bors

ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol cc @lqd

try-job: dist-aarch64-linux

@bors

@bors

☀️ Try build successful - checks-actions
Build commit: ef33f24 (ef33f249830d94b7afdb529458aae4052f14ca98)

@mrkajetanp

1h54 cached, not so bad. Back to roughly the same time as the x86 cross build then.
Completely outside of the CI costs conversations, these changes not making the overall turnaround time for bors longer than it is already is probably a nice bonus.

@lqd

I assume good benchmark results can also help with the cost discussion.

@Kobzol

Indeed! You can download the CI artifacts e.g. using rustup-toolchain-install-master and benchmark it locally using rustc-perf. It would be nice to see the perf. diff. Let me know on Zulip if you want help with that.

@Kobzol

@bors

📌 Commit 3d54764 has been approved by Kobzol

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

labels

Jan 14, 2025

@Kobzol Kobzol added S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

relnotes-perf

Performance improvements that should be mentioned in the release notes.

and removed S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

labels

Jan 14, 2025

bors added a commit to rust-lang-ci/rust that referenced this pull request

Jan 15, 2025

@bors

Rollup of 7 pull requests

Successful merges:

r? @ghost @rustbot modify labels: rollup

@klensy

r+ but removed S-waiting-on-bors?

@Kobzol Kobzol added S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

labels

Jan 15, 2025

@Kobzol

Oops, looks like my manual issue modification has raced with bors :) Thanks!

@lqd

Maybe we should rollup=never big CI changes like these in the future

@Kobzol

Yes, I just thought that after I noticed that it was already included in a rollup 😆 I guess that usually we run perf. for similar changes, so it's done by default, but since we don't have perf. monitoring for ARM (yet!), it wasn't done here, so I forgot about it, sorry.

Just in case the currently running rollup fails, let's mark it as such.

@bors rollup=never

@bors

rust-timer added a commit to rust-lang-ci/rust that referenced this pull request

Jan 15, 2025

@rust-timer

Rollup merge of rust-lang#133807 - mrkajetanp:ci-aarch64-opt-dist, r=Kobzol

ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol cc @lqd

try-job: dist-aarch64-linux

@Kobzol

^ is hopefully just a visual bug.

saethlin

Comment on lines +149 to +163

let is_aarch64 = target_triple.starts_with("aarch64");
let mut skip_tests = vec![
// Fails because of linker errors, as of June 2023.
"tests/ui/process/nofile-limit.rs".to_string(),
];
if is_aarch64 {
skip_tests.extend([
// Those tests fail only inside of Docker on aarch64, as of December 2024
"tests/ui/consts/promoted_running_out_of_memory_issue-130687.rs".to_string(),
"tests/ui/consts/large_const_alloc.rs".to_string(),
]);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, I beg of you, do not do this. This hack is causing CI to pass and local developer workflows to fail, and it is hiding this regression #135952.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain, we already skip some tests in a similar manner on dist x64 (not just the explicitly skipped ones, but also whole test suites, e.g. run-make).

Some tests fail on dist, but work locally, these are candidates for being skipped on CI (after all, running the test suite on an extracted dist atchive is already a hack).

But if the test fails also outside of this extracted dist setup, then it shouldn't be skipped, ofc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same feedback about the two tests that were using this feature before. See #135961.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we could remove specific skipped tests, but do you also have an issue with skipping any tests in the dist tests? Because we currently don't run some parts of the test suite, so these are also effectivelly skipped, just without being explicitly enumerated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the 4 tests that were individually skipped, 3 also did not work for me with x test --stage 2 in an aarch64 dev environment, and the 4th was for a platform I don't have a dev environment on.

Ignoring or skipping tests in the harness is fine if there is some fundamental incompatibility with the harness. Or not running them because nobody has gotten to wiring them into opt-dist yet.

I would have preferred these tests be ignored in the tests themselves via annotations; that would have at least not had me chasing my tail wondering how CI could be passing when the tests fail locally. And it would have made the problem more visible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with explicit skipping via annotations being the better approach for these kinds of problems, you're right.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll write up a patch to change this in a moment, let's see what people say

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point with the annotations. Recently we added a test annotation for only running a given test in dist jobs, the same should be usable also for ignoring a test in a dist job (#135164 - ignore-dist).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've posted a PR that I've linked above which fixes the two tests that aren't related to the aarch64 vs x86_64 allocation failure issue. I'm away from a keyboard for a few days, but it looks like it works, and I would recommend applying that change and ignoring the large const allocs failure on aarch64, pending further investigation.

@workingjubilee

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

We should ideally never simultaneously change the CI runner and also enable an optimized build.

@mrkajetanp

We should ideally never simultaneously change the CI runner and also enable an optimized build.

FWIW the description there is a leftover from before we split the PRs, the runner change was done separately here.

@workingjubilee

Ah, thank you. In that case ideally PRs should be updated so that their messages reflect their actual content, but I am happy to know things happened in a more bisectable fashion.

github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this pull request

Mar 11, 2025

@bors

Rollup of 7 pull requests

Successful merges:

r? @ghost @rustbot modify labels: rollup

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request

Apr 9, 2025

@he32

Upstream changes relative to 1.85.1:

Version 1.86.0 (2025-04-03)

Language

Compiler

Platform Support

Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support.

Libraries

Stabilized APIs

These APIs are now stable in const contexts:

Cargo

Rustdoc

Compatibility Notes

Internal Changes

These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools.

Labels

A-testsuite

Area: The testsuite used to check the correctness of rustc

relnotes-perf

Performance improvements that should be mentioned in the release notes.

S-waiting-on-bors

Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

T-infra

Relevant to the infrastructure team, which will review and decide on the PR/issue.