Help test parallel rustc! (original) (raw)
December 18, 2019, 4:42pm 1
It's that time of year again and we've got a small gift for all y'all for the holidays! The parallel compiler working group has implemented a plan for you to test out a build of rustc
which has far more parallelism than the current rustc
does today. To cut straight to the chase, the perf improvements are looking great and we're curious to compare two nightly compilers against each other:
nightly-2019-12-18
- this compiler has more parallelismnightly-2019-12-17
- this compiler has less parallelism
You can acquire, test and run these compilers with:
$ rustup update nightly-2019-12-18
$ rustup update nightly-2019-12-17
$ cargo +nightly-2019-12-18 build
$ cargo +nightly-2019-12-17 build
(etc)
What is parallel rustc?
But wait, you may be saying, isn't rustc already parallel! You're correct, rustc already has internal parallelism when it comes to codegen units and LLVM. The compiler, however, is not parallel at all when it's typechecking, borrow-checking, or running other static analyses on your crate. These frontend passes of the compiler are completely serial today. In development for quite some time now is a compiler that can run nearly every single step of the compiler in parallel.
Enabling parallelism in rustc, however, drastically changes internal data structures (think using Arc
instead of Rc
). For this reason previous builds of rustc do not have the ability to support frontend parallelism. A special nightly build, nightly-2019-12-18
, has been prepared which has support compiled in for parallelism. This is experimental support we're still evaluating, though, so the commit has already been reverted and subsequent nightlies will be back to as they were previously.
What information to gather?
The parallel compiler working group is keen to get widespread feedback on the parallel mode of the compiler. We're interested in basically any feedback you have to offer, but some specifics to help you get started we're interested in are:
- Have you found a bug? Please report it!
- For example did rustc crash?
- deadlock?
- produce a nondeterministic result?
- exhibiting any other weirdness when compiling?
- Is parallel rustc faster?
- When comparing, please compare
nightly-2019-12-18
(parallel) andnightly-2019-12-17
(not parallel) - Is a full build faster?
- Is a full release build faster?
- Is a check build faster?
- How about incremental builds?
- Single-crate builds?
- When comparing, please compare
- How does parallelism look to you?
- Did rustc get slower from trying to be too parallel?
Time measuring tools like the time
shell built-in as well as /usr/bin/time
are extra useful here because they give insight to a number of statistics we're interested in watching. For example kernel time, user time, wall time, context switches, etc. If you've got info, we're happy to review it!
Some example commands to compare are:
# full build
$ cargo clean && time cargo +nightly-2019-12-18 build
$ cargo clean && time cargo +nightly-2019-12-17 build
# full release build
$ cargo clean && time cargo +nightly-2019-12-18 build --release
$ cargo clean && time cargo +nightly-2019-12-17 build --release
# full check
$ cargo clean && time cargo +nightly-2019-12-18 check
$ cargo clean && time cargo +nightly-2019-12-17 check
# ... (etc)
When you report data it'd also be very helpful if you indicated what your system looks like. For example:
- What OS do you have? (Windows/Mac/Linux)
- How many CPUs do you have? (cores/threads/etc)
- How much memory do you have?
We've already seen some widely varying data across different system layouts, for example 28-thread machines have shown very different performance characteristics than 8-thread machines. Most testing has happened on Linux so far so we're very interested to get more platforms into the pipeline too!
Known issues
- The compiler will max out at 4 parallelism. We've hit some issues with rustc scaling to many threads causing slowdowns for various reasons. We're working on a solution and have a number of ideas of how to solve this. If you've got a 128 core system and only 4 are in use, fear not! We'll soon be able to make use of everything
- If you pass
-jN
(which defaults to the number of cores you have) Cargo may end up spawning more thanN
rustc processes. No more thanN
should actually be doing work, but it may be the case that more processes are spawned. We plan to fix this before shipping parallel rustc.
Thanks in advance for helping us out! We hope to turn at least some parallelism on by default early next year (think January) with full parallelism coming soon after. That all depends on the feedback we get from this thread, though, and we'd like to weed out any issues before we turn this on by default!
josh December 18, 2019, 5:01pm 3
I wouldn't call it "not much difference" for smaller crates; that was a solid ~7% improvement, which is quite welcome.
Building all of Servo (which includes a lot of C++) in release mode on a 14 cores 28 threads Linux desktop went from 6m47s to 6m18s, improving by 7%.
Time for the script
crate excluding codegen went from 64.8 to 42.9 seconds, improving by 34%!
That crate is by far the largest, and the shape of the dependency graph is such that not a lot else is happening while it is being compiled. In the output of cargo build -Z timings
, the CPU usage graph is very telling:
I think we see this happen in this CPU usage graph around the 240s mark. It sounds promising that even more wins are within reach!
There are still times where only one CPU thread is being used, though. Are some parts of the frontend still sequential?
josh December 18, 2019, 6:07pm 5
I think wasmtime has the same thing happen, when compiling syn
, which normally ends up in a single-threaded compile for a little while in the middle of the build. Now I see a few CPUs in use at that point.
skade December 18, 2019, 6:26pm 6
Timings for async-std 1.3.0:
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-18 build
real 0m7,978s
user 0m31,223s
sys 0m2,658s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-17 build
real 0m8,853s
user 0m30,087s
sys 0m2,182s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-18 build --release
real 0m11,329s
user 1m4,118s
sys 0m2,573s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-17 build --release
real 0m12,421s
user 1m5,124s
sys 0m2,218s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-18 check
real 0m5,248s
user 0m18,615s
sys 0m2,014s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-17 check
real 0m6,084s
user 0m16,230s
sys 0m1,606s
It's roughly 10% on debug builds, slightly less in release. Which is not a big surprise, we spend a lot of our time linking.
Edit: Sorry, forgot the machine info:
Carbon X1 6th gen. 4core i7 @ 2GHz + 16GB Ram.
Yes there's a number of portions of the compiler that are still sequential, others can speak more to specifics but I think the high-level ones are:
- Codegen (sorta). Only one thread performs translation from MIR to LLVM IR so it takes some time to "ramp up" and get parallelism. Once parallelism is on-by-default we plan to refactor this to have truly parallel codegen.
- Parsing
- Name resolution
The compiler isn't perfectly parallel, and we've found it's increasingly more difficult to land more parallelism unless it's all on by default. The thinking is that what we currently have is the next big step forward, but it's certainly not the end!
I also agree that the little bump in the middle of the graph you're looking at is the 4 cores getting active. Looks like that rate limiting is actually working! You can also experiment with the -Zthreads
value (such as -Zthreads=28
) if you'd like to test higher numbers. You may experience slowdowns at the beginning of the compilation but are likely to experience speedups for the script crate itself.
It may be worthwhile perhaps trying out just the script
crate compilation, with a high -Zthreads
limit? You may also be able to get some mileage with measureme to see where the sequential bottlenecks are so we can plan to work on those too!
Overall, this change is a great improvement! The stats below were obtained from building tokio-rs/tracing
at commit fc3ab4. The VM is c3.8xlarge (32 Virtual CPUs, 60.0 GiB Memory, 640 GiB SSD Storage) running Amazon Linux 2.
cargo build
: A speedup of 33.6%.
cargo clean && time cargo +nightly-2019-12-17 build
: 21.42 secondscargo clean && time cargo +nightly-2019-12-18 build
: 14.23 seconds
131.11user 7.97system 0:21.42elapsed 649%CPU (0avgtext+0avgdata 587096maxresident)k
0inputs+1127952outputs (333major+1828747minor)pagefaults 0swaps
157.53user 9.38system 0:14.23elapsed 1172%CPU (0avgtext+0avgdata 656468maxresident)k
0inputs+1109600outputs (0major+2030716minor)pagefaults 0swaps
cargo build --release
: A speedup of 11.7%.
cargo clean && time cargo +nightly-2019-12-17 build --release
: 34.29 secondscargo clean && time cargo +nightly-2019-12-18 build --release
: 30.28 seconds
421.71user 8.69system 0:34.30elapsed 1254%CPU (0avgtext+0avgdata 738148maxresident)k
0inputs+460136outputs (0major+1976122minor)pagefaults 0swaps
463.09user 10.84system 0:30.29elapsed 1564%CPU (0avgtext+0avgdata 758324maxresident)k
0inputs+459600outputs (0major+2191324minor)pagefaults 0swaps
cargo build --check
: A speedup of 26.1%.
cargo clean && cargo +nightly-2019-12-17 check
: 18.56 secondscargo clean && cargo +nightly-2019-12-18 check
: 13.71 seconds
90.01user 5.22system 0:18.56elapsed 513%CPU (0avgtext+0avgdata 555520maxresident)k
0inputs+461128outputs (0major+1384224minor)pagefaults 0swaps
109.42user 6.85system 0:13.71elapsed 847%CPU (0avgtext+0avgdata 593012maxresident)k
0inputs+461072outputs (0major+1633770minor)pagefaults 0swaps
I’ll try to dig in further in a bit, but I got a very surprising result trying to build Volta: it has some small but noticeable improvements in debug builds… and crashed the laptop hard, twice in a row, when running the parallel release build when on battery. The regular release build was fine. Once I’m on power again I’ll see what happens and figure out whether this was a fluke!
josh December 18, 2019, 7:01pm 10
Just tested wasmtime with -Zthreads=72
, and the time went from 1m9s (parallel rustc without -Zthreads) to 2m1s (parallel rustc with -Zthreads=72
). The user time massively increased, from 16m50s to 50m.
Watching htop
, it looks like the jobserver bits between cargo and rustc aren't actually limiting to 72 jobs at a time, because I see a load average around 300-500 and many hundreds of running (not blocked) rustc processes across many crates.
Yes this is a known bug we're working on, on my 28-thread system locally -Zthreads=28
makes compiles times quite bad. We'll be sure to reach out to you though when we think we have a fix for this, 72 is the highest number we've seen so far!
Basically, it's expected that -Zthreads=72
performs pretty badly right now.
mark-i-m December 18, 2019, 8:36pm 12
Results from compiling the spurs
crate...
I ran each command twice.
Intel(R) Core(TM) i5-7500T CPU @ 2.70GHz
16GB RAM
# full build, serial compiler
real 0m10.321s
user 0m36.706s
sys 0m2.307s
real 0m10.215s
user 0m36.790s
sys 0m2.333s
# full build, parallel compiler
real 0m11.396s
user 0m38.184s
sys 0m2.716s
real 0m12.243s
user 0m38.848s
sys 0m2.724s
# full build, release, serial
real 0m23.591s
user 1m27.648s
sys 0m2.180s
real 0m23.220s
user 1m27.705s
sys 0m2.254s
# full build, release, parallel
real 0m23.931s
user 1m29.144s
sys 0m2.432s
real 0m24.661s
user 1m28.921s
sys 0m2.435s
# check, serial
real 0m7.703s
user 0m20.652s
sys 0m1.827s
real 0m7.712s
user 0m20.718s
sys 0m1.816s
# check, parallel
real 0m7.921s
user 0m22.267s
sys 0m1.986s
real 0m8.068s
user 0m22.192s
sys 0m1.987s
So in summary
full build: parallel is 15% (~1.4 seconds) slower on average
full build release: parallel is 3% (<1 second) slower on average
check: 3% (<1 second) slower on average
On crates.io's codebase at 1fc6bfa on macOS Catalina 10.15.1, 3.1 GHz Dual-Core Intel Core i7, 16 GB RAM:
Full build
$ cargo clean && time cargo +nightly-2019-12-18 build
real 10m1.042s
user 18m47.605s
sys 1m50.826s
$ cargo clean && time cargo +nightly-2019-12-17 build
real 8m33.007s
user 17m14.295s
sys 1m38.205s
Soooo parallelism made it slower?
Full release build
$ cargo clean && time cargo +nightly-2019-12-18 build --release
real 18m58.546s
user 44m1.749s
sys 1m48.858s
$ cargo clean && time cargo +nightly-2019-12-17 build --release
real 16m54.551s
user 43m2.278s
sys 1m38.579s
Also slower for me
Full check
$ cargo clean && time cargo +nightly-2019-12-18 check
real 6m35.014s
user 12m18.504s
sys 1m22.026s
$ cargo clean && time cargo +nightly-2019-12-17 check
real 5m16.316s
user 11m27.044s
sys 1m17.450s
I'm happy to run anything else that would be useful, or provide any other info, let me know what!
tmandry December 19, 2019, 4:01am 14
I tested on Fuchsia. Everything built and all tests passed correctly, so no problems there.
Here are three timings from builds that were run without parallel rustc enabled:
5284.35user 696.95system 3:00.15elapsed 3320%CPU (0avgtext+0avgdata 2963072maxresident)k
1704inputs+61728288outputs (173653major+94432200minor)pagefaults 0swaps
5337.72user 679.63system 3:07.12elapsed 3215%CPU (0avgtext+0avgdata 2962480maxresident)k
6217992inputs+61728224outputs (174415major+97968950minor)pagefaults 0swaps
5279.77user 705.58system 3:01.22elapsed 3302%CPU (0avgtext+0avgdata 2968028maxresident)k
14080inputs+61729144outputs (182088major+96475006minor)pagefaults 0swaps
And here are 3 runs with parallel rustc enabled:
5313.74user 689.57system 2:59.98elapsed 3335%CPU (0avgtext+0avgdata 2982788maxresident)k
1048inputs+60992440outputs (155345major+96491042minor)pagefaults 0swaps
5318.76user 692.16system 2:58.39elapsed 3369%CPU (0avgtext+0avgdata 2964200maxresident)k
1137304inputs+61014216outputs (160991major+98495705minor)pagefaults 0swaps
5330.31user 691.24system 2:57.67elapsed 3389%CPU (0avgtext+0avgdata 2965328maxresident)k
1040inputs+60995288outputs (159364major+97233184minor)pagefaults 0swaps
So it looks like we saw a small but significant improvement in build time. I suspect that enabling >4 threads would lead to further improvements.
Note that there are some non-Rust steps in those builds, so the gain might be higher percentage-wise. (Only the Rust targets were invalidated, but we have some targets that depend on everything at the very end.)
This wasn't very scientific, so should probably be taken with a grain of salt.
On the build of specifically our third party code (built with cargo; takes around 20s; included in the above builds) I did not notice any significant change.
Okay, after getting machine on power, results:
cargo clean && time cargo +nightly-2019-12-18 build
:80.14 real 485.11 user 48.62 sys
cargo clean && time cargo +nightly-2019-12-17 build
:75.14 real 456.43 user 37.09 sys
cargo clean && time cargo +nightly-2019-12-18 build --release
:136.43 real 1227.13 user 43.18 sys
cargo clean && time cargo +nightly-2019-12-17 build --release
:138.98 real 1214.59 user 33.23 sys
(Best guess: I hit some odd condition with maxing the cores on low battery earlier.)
josh December 19, 2019, 7:02am 16
Quick followup: I did some perf runs on -Zthreads=72
builds, and it looks like the substantial user-time overhead (going from 16m50s to 50m, or 40m with a kernel patch to improve pipe wakeup fairness) consists heavily of attempts to do work-stealing:
13.07% rustc librustc_driver-0d78d9a30be443c5.so [.] std::thread::local::LocalKey<T>::try_with
10.95% rustc librustc_driver-0d78d9a30be443c5.so [.] crossbeam_epoch::internal::Global::try_advance
6.93% rustc [unknown] [k] 0xffffffff91a00163
5.86% rustc librustc_driver-0d78d9a30be443c5.so [.] crossbeam_deque::Stealer<T>::steal
4.14% rustc librustc_driver-0d78d9a30be443c5.so [.] <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold
3.02% rustc ld-2.29.so [.] _dl_update_slotinfo
2.18% rustc ld-2.29.so [.] __tls_get_addr
1.65% rustc librustc_driver-0d78d9a30be443c5.so [.] crossbeam_epoch::default::HANDLE::__getit
1.39% rustc librustc_driver-0d78d9a30be443c5.so [.] crossbeam_epoch::default::pin
1.14% rustc ld-2.29.so [.] update_get_addr
1.14% rustc libc-2.29.so [.] __memmove_sse2_unaligned_erms
1.10% rustc [unknown] [k] 0xffffffff91a00b27
1.05% rustc rustc [.] free
0.93% rustc rustc [.] malloc
0.70% rustc ld-2.29.so [.] __tls_get_addr_slow
0.66% rustc libLLVM-9-rust-1.41.0-nightly.so [.] combineInstructionsOverFunction
0.64% rustc librustc_driver-0d78d9a30be443c5.so [.] __tls_get_addr@plt
I dug further, and the calls to std::thread::local::LocalKey<T>::try_with
come from crossbeam_deque::Stealer<T>::steal
.
I used my latest project multi_file_writer and compared it with stable too.
The CPUs look more occupied in the load graph when 2019-12-18 is running. However, having the benchmark running tree times shows me that the variance is bigger than the improvement. And for a laptop, cooling has a greater effect than the optimization.
- Hardware: AMD Ryzen 7 PRO 3700U (8 Cores, 24GB RAM, 1TB SSD - Lenovo T495
- Software: Kubuntu Linux 19.10, Kernel 5.3.0-24-generic
# full build
cargo clean && time cargo +nightly-2019-12-18 build
real 0m17.877s 0m16.555s 0m18.536s
user 1m35.093s 1m32.151s 1m39.613s
sys 0m4.695s 0m4.615s 0m5.139s
cargo clean && time cargo +nightly-2019-12-17 build
real 0m18.620s 0m17.727s 0m18.014s
user 1m25.832s 1m23.918s 1m24.582s
sys 0m3.884s 0m3.891s 0m3.960s
cargo clean && time cargo +stable build
real 0m18.954s 0m18.702s 0m19.721s
user 1m31.474s 1m29.693s 1m31.224s
sys 0m4.240s 0m3.892s 0m4.166s
# full release build
cargo clean && time cargo +nightly-2019-12-18 build --release
real 0m45.455s 0m40.779s 0m41.285s
user 4m42.960s 4m27.842s 4m27.474s
sys 0m5.058s 0m4.540s 0m5.088s
cargo clean && time cargo +nightly-2019-12-17 build --release
real 0m43.404s 0m43.381s 0m41.122s
user 4m35.760s 4m30.659s 4m25.856s
sys 0m4.376s 0m4.173s 0m4.233s
cargo clean && time cargo +stable build --release
real 0m40.928s 0m40.848s 0m42.138s
user 4m22.785s 4m24.846s 4m28.641s
sys 0m4.290s 0m4.274s 0m4.358s
# full check
cargo clean && time cargo +nightly-2019-12-18 check
real 0m12.896s 0m13.526s 0m12.870s
user 0m59.298s 0m59.617s 0m58.271s
sys 0m3.352s 0m3.519s 0m3.672s
cargo clean && time cargo +nightly-2019-12-17 check
real 0m13.727s 0m14.880s 0m13.569s
user 0m49.721s 0m53.186s 0m51.119s
sys 0m2.922s 0m2.983s 0m3.023s
cargo clean && time cargo +stable check
real 0m14.428s 0m14.235s 0m14.488s
user 0m52.424s 0m52.405s 0m53.769s
sys 0m3.142s 0m2.962s 0m3.148s
My CPUs layout:
If there is an other benchmark, I would be happy to run that too.
Testing against reso_dd at commit 2d21506, a soon-to-be-released crate that is 2.2 MB of serde-annotated structs. The only functions it contains are custom serde serialization/deserialization. Dependencies are serde and chrono.
For this particular type of workload, the speedup seems quite significant.
Operating system: MacOS 10.15.1 Hardware: MacBook Pro (16-inch, 2019), 2.4 GHz 8-Core Intel Core i9, 32 GB 2667 MHz DDR4
# full build
cargo +nightly-2019-12-18 build -p reso_dd 63.89s user 3.26s system 208% cpu 32.208 total
cargo +nightly-2019-12-17 build -p reso_dd 58.59s user 2.94s system 134% cpu 45.696 total
# full release build
cargo +nightly-2019-12-18 build -p reso_dd --release 136.12s user 3.42s system 390% cpu 35.720 total
cargo +nightly-2019-12-17 build -p reso_dd --release 129.22s user 3.11s system 289% cpu 45.679 total
# full check
cargo +nightly-2019-12-18 check -p reso_dd 53.91s user 2.51s system 212% cpu 26.513 total
cargo +nightly-2019-12-17 check -p reso_dd 49.01s user 2.22s system 134% cpu 38.152 total
aidanhs December 19, 2019, 12:40pm 19
Using commit 80066510b54f1ae05d51f65b52e18bdd5357016c of differential dataflow and compiling some of the examples, I found:
- a noticeable speedup when doing a full build, i.e. dependencies are being compiled (~10% for debug, ~5% for release)
- no speedup (or: lost in the noise) when just compiling final binaries (makes sense - rustc timings indicate most time is spent in llvm, also explaining debug/release difference) (I don't know how much the type rechecking etc the
touch
will cause)
# Parallel debug builds:
$ rm -rf target
$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress
[...]
Finished dev [unoptimized + debuginfo] target(s) in 31.48s
110.99user 4.06system 0:31.50elapsed 365%CPU (0avgtext+0avgdata 920740maxresident)k
440inputs+1249848outputs (7major+1433125minor)pagefaults 0swaps
$ /usr/bin/time cargo +nightly-2019-12-18 build --example pagerank
[...]
Finished dev [unoptimized + debuginfo] target(s) in 11.17s
28.30user 0.69system 0:11.17elapsed 259%CPU (0avgtext+0avgdata 804092maxresident)k
0inputs+444360outputs (0major+307311minor)pagefaults 0swaps
$ /usr/bin/time cargo +nightly-2019-12-18 build --example monoid-bfs
[...]
Finished dev [unoptimized + debuginfo] target(s) in 10.13s
25.70user 0.81system 0:10.14elapsed 261%CPU (0avgtext+0avgdata 770656maxresident)k
0inputs+413664outputs (0major+290452minor)pagefaults 0swaps
# Non-parallel debug builds
$ rm -rf target
$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress
[...]
Finished dev [unoptimized + debuginfo] target(s) in 36.61s
101.69user 3.60system 0:36.62elapsed 287%CPU (0avgtext+0avgdata 960348maxresident)k
32inputs+1249832outputs (0major+1290682minor)pagefaults 0swaps
$ /usr/bin/time cargo +nightly-2019-12-17 build --example pagerank
[...]
Finished dev [unoptimized + debuginfo] target(s) in 10.59s
28.35user 0.87system 0:10.60elapsed 275%CPU (0avgtext+0avgdata 845212maxresident)k
0inputs+444344outputs (0major+322688minor)pagefaults 0swaps
$ /usr/bin/time cargo +nightly-2019-12-17 build --example monoid-bfs
[...]
Finished dev [unoptimized + debuginfo] target(s) in 10.11s
26.29user 0.71system 0:10.12elapsed 266%CPU (0avgtext+0avgdata 755828maxresident)k
0inputs+413624outputs (0major+277129minor)pagefaults 0swaps
# Parallel release builds
$ rm -rf target
$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress --release
[...]
Finished release [optimized + debuginfo] target(s) in 2m 11s
297.09user 4.98system 2:11.90elapsed 229%CPU (0avgtext+0avgdata 2827380maxresident)k
0inputs+1018456outputs (0major+2283912minor)pagefaults 0swaps
$ touch examples/progress.rs
$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress --release
[...]
Finished release [optimized + debuginfo] target(s) in 1m 35s
132.45user 1.34system 1:35.33elapsed 140%CPU (0avgtext+0avgdata 2835180maxresident)k
0inputs+424168outputs (0major+993830minor)pagefaults 0swaps
#Non-parallel release builds
$ rm -rf target
$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress --release
[...]
Finished release [optimized + debuginfo] target(s) in 2m 20s
289.44user 4.20system 2:20.92elapsed 208%CPU (0avgtext+0avgdata 2828940maxresident)k
0inputs+1042776outputs (0major+2153859minor)pagefaults 0swaps
$ touch examples/progress.rs
$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress --release
[...]
Finished release [optimized + debuginfo] target(s) in 1m 37s
134.14user 1.39system 1:37.55elapsed 138%CPU (0avgtext+0avgdata 2845420maxresident)k
0inputs+448848outputs (0major+1003126minor)pagefaults 0swaps
December 19, 2019, 7:03pm 20
Timing data for the turtle crate.
This is using a fairly powerful CPU. Full system info at the end of this comment.
cargo clean && time cargo +nightly-2019-12-18 build
real 0m23.471s
user 4m30.381s
sys 0m17.490s
cargo clean && time cargo +nightly-2019-12-17 build
real 0m26.377s
user 4m3.021s
sys 0m13.118s
cargo clean && time cargo +nightly-2019-12-18 build --release
real 0m43.751s
user 10m41.284s
sys 0m17.831s
cargo clean && time cargo +nightly-2019-12-17 build --release
real 0m47.365s
user 10m21.766s
sys 0m13.473s
The turtle crate has a lot of examples, so I thought it would be worth it to run those too:
cargo clean && time cargo +nightly-2019-12-18 build --examples
real 0m33.796s
user 6m52.774s
sys 0m36.885s
cargo clean && time cargo +nightly-2019-12-17 build --examples
real 0m36.875s
user 6m27.881s
sys 0m31.876s
System info