Improved checked_isqrt
and isqrt
methods by ChaiTRex · Pull Request #128166 · rust-lang/rust (original) (raw)
If you run benchmarks again, could you post your cpu model for a reference?
I was using aarch64-apple-darwin
on a MacBook Air with an M1 processor under macOS 14.5.1 (now 14.6).
After I replaced the benchmarks with equivalents to ilog10
's benchmarks (which use an exponential distribution for the random inputs), I get the following:
Benchmarks
Before this pull request
benchmarks:
num::int_sqrt::u128_sqrt_predictable 108945.82/iter +/- 1673.86
num::int_sqrt::u128_sqrt_random 16389.17/iter +/- 231.44
num::int_sqrt::u128_sqrt_random_small 1290.22/iter +/- 41.90
num::int_sqrt::u16_sqrt_predictable 1080.06/iter +/- 43.44
num::int_sqrt::u16_sqrt_random 1064.57/iter +/- 20.79
num::int_sqrt::u16_sqrt_random_small 579.85/iter +/- 13.81
num::int_sqrt::u32_sqrt_predictable 2979.65/iter +/- 29.79
num::int_sqrt::u32_sqrt_random 1718.20/iter +/- 58.86
num::int_sqrt::u32_sqrt_random_small 480.19/iter +/- 26.59
num::int_sqrt::u64_sqrt_predictable 13089.55/iter +/- 484.72
num::int_sqrt::u64_sqrt_random 3751.04/iter +/- 79.66
num::int_sqrt::u64_sqrt_random_small 498.58/iter +/- 3.60
num::int_sqrt::u8_sqrt_predictable 329.07/iter +/- 12.64
num::int_sqrt::u8_sqrt_random 576.37/iter +/- 47.12
num::int_sqrt::u8_sqrt_random_small 589.01/iter +/- 22.53
When I first made this pull request
benchmarks:
num::int_sqrt::u128_sqrt_predictable 30728.33/iter +/- 1093.31
num::int_sqrt::u128_sqrt_random 4392.30/iter +/- 149.35
num::int_sqrt::u128_sqrt_random_small 323.58/iter +/- 7.26
num::int_sqrt::u16_sqrt_predictable 304.50/iter +/- 10.60
num::int_sqrt::u16_sqrt_random 334.78/iter +/- 18.02
num::int_sqrt::u16_sqrt_random_small 139.06/iter +/- 2.50
num::int_sqrt::u32_sqrt_predictable 1585.35/iter +/- 59.03
num::int_sqrt::u32_sqrt_random 972.40/iter +/- 38.47
num::int_sqrt::u32_sqrt_random_small 149.02/iter +/- 4.50
num::int_sqrt::u64_sqrt_predictable 5641.28/iter +/- 136.16
num::int_sqrt::u64_sqrt_random 1703.97/iter +/- 43.75
num::int_sqrt::u64_sqrt_random_small 197.42/iter +/- 4.03
num::int_sqrt::u8_sqrt_predictable 42.90/iter +/- 1.39
num::int_sqrt::u8_sqrt_random 133.55/iter +/- 3.92
num::int_sqrt::u8_sqrt_random_small 133.85/iter +/- 2.36
With the changes I'm going to push soon
benchmarks:
num::int_sqrt::u128_sqrt_predictable 26622.63/iter +/- 386.54
num::int_sqrt::u128_sqrt_random 3584.59/iter +/- 131.42
num::int_sqrt::u128_sqrt_random_small 904.23/iter +/- 22.04
num::int_sqrt::u16_sqrt_predictable 388.01/iter +/- 7.85
num::int_sqrt::u16_sqrt_random 480.00/iter +/- 16.07
num::int_sqrt::u16_sqrt_random_small 332.05/iter +/- 7.46
num::int_sqrt::u32_sqrt_predictable 1317.23/iter +/- 25.71
num::int_sqrt::u32_sqrt_random 783.17/iter +/- 8.53
num::int_sqrt::u32_sqrt_random_small 415.10/iter +/- 14.06
num::int_sqrt::u64_sqrt_predictable 4563.42/iter +/- 129.48
num::int_sqrt::u64_sqrt_random 1378.32/iter +/- 60.46
num::int_sqrt::u64_sqrt_random_small 658.41/iter +/- 8.93
num::int_sqrt::u8_sqrt_predictable 42.90/iter +/- 0.82
num::int_sqrt::u8_sqrt_random 132.98/iter +/- 1.53
num::int_sqrt::u8_sqrt_random_small 132.94/iter +/- 2.94
There's a tradeoff here. In the push I'm going to make soon, randomly chosen inputs from the whole input type range get a decent speedup from the initial pull request's code, but randomly chosen small inputs get about 3 times slower. Is that a good tradeoff? Is there another benchmark that should be added?