Vectorize adjacent_find
by AlexGuteniev 路 Pull Request #5331 路 microsoft/STL (original) (raw)
The approach is similar to find
vectorization, except for shifted load, which is similar to unique
in #5092, Nothing novel 馃ケ
This time I haven't tried to avoid double load, I trust past experience that it is faster than blending with previous value, or using not the whole vector as a step.
I calculated Speedup as Before divided by After using a spreadsheet software.
Benchmark | Before | After | Speedup |
---|---|---|---|
bm<AlgType::Std. char>/2525/1142 | 317 ns | 17.2 ns | 18.43 |
bm<AlgType::Std. short>/2525/1142 | 295 ns | 49.6 ns | 5.95 |
bm<AlgType::Std. int>/2525/1142 | 285 ns | 88.2 ns | 3.23 |
bm<AlgType::Std. long long>/2525/1142 | 284 ns | 161 ns | 1.76 |
bm<AlgType::Rng. char>/2525/1142 | 282 ns | 20.4 ns | 13.82 |
bm<AlgType::Rng. short>/2525/1142 | 283 ns | 47.1 ns | 6.01 |
bm<AlgType::Rng. int>/2525/1142 | 280 ns | 82.3 ns | 3.40 |
bm<AlgType::Rng. long long>/2525/1142 | 289 ns | 142 ns | 2.04 |