Benchmark count for vector<bool> by AlexGuteniev · Pull Request #5684 · microsoft/STL (original) (raw)
There is count for vector<bool> optimization that uses popcnt on the integer elements of the vector<bool> internal representation, originally added in #1131. There's open PR #5640 to enhance that optimization further.
This PR adds benchmark to measure the results.
I've started off copying vector_bool_copy.cpp to mimic the existing style, then left only the _algined case, since unalignment doesn't make significant impact (unlike copying), still left the same name (as just count matches the STL algorithm name), and added DoNotOptimize where necessary. The value to count is alternating to explore both branches without adding extra benchmarks.
The results for #5640 are mixed for me.
On P cores of i5-1235U I see no improvement:
| Benchmark | Before | After | Speedup |
|---|---|---|---|
| count_aligned/64 | 17.0 ns | 17.0 ns | 1.00 |
| count_aligned/4096 | 61.5 ns | 59.6 ns | 1.03 |
| count_aligned/65536 | 718 ns | 747 ns | 0.96 |
On E cores I see some improvement, which is not too little for such a small change:
| Benchmark | Before | After | Speedup |
|---|---|---|---|
| count_aligned/64 | 21.3 ns | 21.6 ns | 0.99 |
| count_aligned/4096 | 114 ns | 90.6 ns | 1.26 |
| count_aligned/65536 | 1505 ns | 1092 ns | 1.38 |