Improved checked_isqrt and isqrt methods by ChaiTRex · Pull Request #128166 · rust-lang/rust (original) (raw)

If you run benchmarks again, could you post your cpu model for a reference?

I was using aarch64-apple-darwin on a MacBook Air with an M1 processor under macOS 14.5.1 (now 14.6).

After I replaced the benchmarks with equivalents to ilog10's benchmarks (which use an exponential distribution for the random inputs), I get the following:

Benchmarks

Before this pull request

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  108945.82/iter +/- 1673.86
    num::int_sqrt::u128_sqrt_random        16389.17/iter  +/- 231.44
    num::int_sqrt::u128_sqrt_random_small   1290.22/iter   +/- 41.90
    num::int_sqrt::u16_sqrt_predictable     1080.06/iter   +/- 43.44
    num::int_sqrt::u16_sqrt_random          1064.57/iter   +/- 20.79
    num::int_sqrt::u16_sqrt_random_small     579.85/iter   +/- 13.81
    num::int_sqrt::u32_sqrt_predictable     2979.65/iter   +/- 29.79
    num::int_sqrt::u32_sqrt_random          1718.20/iter   +/- 58.86
    num::int_sqrt::u32_sqrt_random_small     480.19/iter   +/- 26.59
    num::int_sqrt::u64_sqrt_predictable    13089.55/iter  +/- 484.72
    num::int_sqrt::u64_sqrt_random          3751.04/iter   +/- 79.66
    num::int_sqrt::u64_sqrt_random_small     498.58/iter    +/- 3.60
    num::int_sqrt::u8_sqrt_predictable       329.07/iter   +/- 12.64
    num::int_sqrt::u8_sqrt_random            576.37/iter   +/- 47.12
    num::int_sqrt::u8_sqrt_random_small      589.01/iter   +/- 22.53

When I first made this pull request

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  30728.33/iter +/- 1093.31
    num::int_sqrt::u128_sqrt_random        4392.30/iter  +/- 149.35
    num::int_sqrt::u128_sqrt_random_small   323.58/iter    +/- 7.26
    num::int_sqrt::u16_sqrt_predictable     304.50/iter   +/- 10.60
    num::int_sqrt::u16_sqrt_random          334.78/iter   +/- 18.02
    num::int_sqrt::u16_sqrt_random_small    139.06/iter    +/- 2.50
    num::int_sqrt::u32_sqrt_predictable    1585.35/iter   +/- 59.03
    num::int_sqrt::u32_sqrt_random          972.40/iter   +/- 38.47
    num::int_sqrt::u32_sqrt_random_small    149.02/iter    +/- 4.50
    num::int_sqrt::u64_sqrt_predictable    5641.28/iter  +/- 136.16
    num::int_sqrt::u64_sqrt_random         1703.97/iter   +/- 43.75
    num::int_sqrt::u64_sqrt_random_small    197.42/iter    +/- 4.03
    num::int_sqrt::u8_sqrt_predictable       42.90/iter    +/- 1.39
    num::int_sqrt::u8_sqrt_random           133.55/iter    +/- 3.92
    num::int_sqrt::u8_sqrt_random_small     133.85/iter    +/- 2.36

With the changes I'm going to push soon

benchmarks:
    num::int_sqrt::u128_sqrt_predictable  26622.63/iter +/- 386.54
    num::int_sqrt::u128_sqrt_random        3584.59/iter +/- 131.42
    num::int_sqrt::u128_sqrt_random_small   904.23/iter  +/- 22.04
    num::int_sqrt::u16_sqrt_predictable     388.01/iter   +/- 7.85
    num::int_sqrt::u16_sqrt_random          480.00/iter  +/- 16.07
    num::int_sqrt::u16_sqrt_random_small    332.05/iter   +/- 7.46
    num::int_sqrt::u32_sqrt_predictable    1317.23/iter  +/- 25.71
    num::int_sqrt::u32_sqrt_random          783.17/iter   +/- 8.53
    num::int_sqrt::u32_sqrt_random_small    415.10/iter  +/- 14.06
    num::int_sqrt::u64_sqrt_predictable    4563.42/iter +/- 129.48
    num::int_sqrt::u64_sqrt_random         1378.32/iter  +/- 60.46
    num::int_sqrt::u64_sqrt_random_small    658.41/iter   +/- 8.93
    num::int_sqrt::u8_sqrt_predictable       42.90/iter   +/- 0.82
    num::int_sqrt::u8_sqrt_random           132.98/iter   +/- 1.53
    num::int_sqrt::u8_sqrt_random_small     132.94/iter   +/- 2.94

There's a tradeoff here. In the push I'm going to make soon, randomly chosen inputs from the whole input type range get a decent speedup from the initial pull request's code, but randomly chosen small inputs get about 3 times slower. Is that a good tradeoff? Is there another benchmark that should be added?