Add profiler to bootstrap command by Shourya742 · Pull Request #143525 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation27 Commits6 Checks11 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This PR adds command profiling to the bootstrap command. It tracks the total execution time and records cache hits for each command. It also provides the ability to export execution result to a JSON file. Integrating this with Chrome tracing could further enhance observability.
r? @Kobzol
rustbot added the T-bootstrap
Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
label
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I don't think that we'll need JSON output for now, I think that the main output should be some lightly formatted text that will be human interpretable.
This comment has been minimized.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna try it tomorrow.
This comment has been minimized.
Kobzol left a comment • Loading
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I tried it locally. Good job, it seem to work as expected! I can already see this becoming quite useful when dealing with reports of bootstrap being slow.
I have some suggestions/feedback:
- Streaming Cargo commands (which are the most expensive) ones are not recorded. Could you also add support for profiling these?
- The execution and cache hit timestamps are not very useful. I thought they would be, but in this aggregated output they aren't. Also outputting them as UNIX timestamp is not very useful, and we'd have to depend on e.g.
timeorchronoto improve that, and that's not worth it. You were right when you said that for the trace itself Chrome will be better, I agree now. Sorry! So let's just remove the timestamps and only remember the cache hit counts and the execution duration(s). - Some aggregated stats at the end of the report would be nice. Like the total number of unique commands (fingerprints) that were recorded, the total number of executions (and the aggregated duration of all executions), the total time of bootstrap outside of command executions (so that we can see how large % of bootstrap time was spent on commands specifically - just take a timestamp at the start of bootstrap and when writing the summary, compute the duration, and subtract the sum of command execution durations), the total number of cache hits, and the estimated amount of time saved due to cache hits (take avg. duration of each cached command and multiply it by the number of cache hits it had, and sum that up).
Taking this a bit further, it would be nice to add I/O operations tracing:
- Add
trace!calls to I/O operations (there is a lot of methods that do file/dir copy, symlinking, network downloads, etc.), so that it can be visualized in Chrome. Also ideally find places where I/O is done ad-hoc, and port it to the I/O helper functions onBuilder. - Add these I/O operations to the profile summary, so that we can compare the total durations of I/O and command execution.
But that should be left for a follow-up PR.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Now the output is much cleaner. Some more thoughts:
- In addition to
max_duration, also printtotal_durationper command (in most cases, it will be the same asmax_duration), and print the % of total bootstrap execution time. So e.g. if bootstrap ran for 5s, and a single command was executed three times and in total it took 3s, it should print 60%. E.g.max_duration=2.06s, total_duration=4.5s (60% of total)
Looks like some of our command parallelization can make the aggregated summary a bit weird :D I thought about this, but haven't expected it to be an issue in practice.
Total time spent in command executions: 2.45s
Total bootstrap time: 2.33s
Time spent outside command executions: 0.00ns
I think it's fine though, no need to handle concurrency here.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Please squash and we can merge this. Great work!
…d start time to deferred execution
…port under bootstrap profiling section
📌 Commit 7de1174 has been approved by Kobzol
It is now in the queue for this repository.
bors added S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.
Comparing 78a6e13 (parent) -> a9f2aad (this PR)
Test differences
Show 2 test diffs
2 doctest diffs were found. These are ignored, as they are noisy.
Test dashboard
Run
cargo run --manifest-path src/ci/citool/Cargo.toml --
test-dashboard a9f2aad0454ef1a06de6588d012517b534540765 --output-dir test-dashboard
And then open test-dashboard/index.html in your browser to see an overview of all executed tests.
Job duration changes
- x86_64-apple-2: 3638.3s -> 6118.2s (68.2%)
- pr-check-2: 2672.4s -> 2200.5s (-17.7%)
- pr-check-1: 1825.9s -> 1507.7s (-17.4%)
- x86_64-rust-for-linux: 2994.3s -> 2496.1s (-16.6%)
- x86_64-apple-1: 7582.6s -> 8762.2s (15.6%)
- x86_64-gnu-llvm-19-1: 3885.0s -> 3371.7s (-13.2%)
- dist-apple-various: 5976.0s -> 6751.4s (13.0%)
- i686-gnu-2: 6240.6s -> 5491.8s (-12.0%)
- i686-gnu-1: 8172.5s -> 7200.2s (-11.9%)
- x86_64-gnu-llvm-20-1: 3763.4s -> 3380.9s (-10.2%) How to interpret the job duration changes?
Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.
Finished benchmarking commit (a9f2aad): comparison URL.
Overall result: no relevant changes - no action needed
@rustbot label: -perf-regression
Instruction count
This benchmark run did not return any relevant results for this metric.
Max RSS (memory usage)
Results (primary 4.8%, secondary 5.1%)
A less reliable metric. May be of interest, but not used to determine the overall result above.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) | 4.8% | [4.8%, 4.8%] | 1 |
| Regressions ❌ (secondary) | 5.1% | [2.3%, 7.2%] | 4 |
| Improvements ✅ (primary) | - | - | 0 |
| Improvements ✅ (secondary) | - | - | 0 |
| All ❌✅ (primary) | 4.8% | [4.8%, 4.8%] | 1 |
Cycles
Results (secondary -2.3%)
A less reliable metric. May be of interest, but not used to determine the overall result above.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) | - | - | 0 |
| Regressions ❌ (secondary) | - | - | 0 |
| Improvements ✅ (primary) | - | - | 0 |
| Improvements ✅ (secondary) | -2.3% | [-2.4%, -2.2%] | 2 |
| All ❌✅ (primary) | - | - | 0 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 465.766s -> 465.348s (-0.09%)
Artifact size: 374.54 MiB -> 374.58 MiB (0.01%)
Labels
Area: rustc-dev-guide
This PR was explicitly merged by bors.
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)