Benchmark: use not_highly_aligned_allocator
in more places by AlexGuteniev · Pull Request #5443 · microsoft/STL (original) (raw)
Resolves #5035
To avoid trying both aligned and unaligned allocators, try just unaligned. This makes sure we're checking the worst case, where vectorization would be of less benefit.
Only change the container for potentially-vectorized algorithm benchmark. For, like, random it does not make sense. Also vector<bool>
is vectorized, if we consider GPR-based vectorization as still vectorization, but this will not be sensitive to the alignment.
Some vector algorithms that are potentially sensitive to the alignment are in fact sensitive to alignment, some are not or almost not. The ones that are sensitive are simplest searches, or data movement. Consider adjacent_find
as a good example of the sensitive one, here's how the results became worse:
Benchmark | Before | After |
---|---|---|
bm<AlgType::Std, int8_t>/2525/1142 | 19.9 ns | 22.1 ns |
bm<AlgType::Std, int16_t>/2525/1142 | 33.2 ns | 51.5 ns |
bm<AlgType::Std, int32_t>/2525/1142 | 75.5 ns | 89.1 ns |
bm<AlgType::Std, int64_t>/2525/1142 | 139 ns | 163 ns |
bm<AlgType::Rng, int8_t>/2525/1142 | 16.7 ns | 20.1 ns |
bm<AlgType::Rng, int16_t>/2525/1142 | 33.0 ns | 50.5 ns |
bm<AlgType::Rng, int32_t>/2525/1142 | 76.9 ns | 89.0 ns |
bm<AlgType::Rng, int64_t>/2525/1142 | 141 ns | 163 ns |
adjacent_find
does two AVX loads of the same input data at a time, with aligned allocator just one of the loads is unaligned, with unaligned allocator both of them are unaligned. So it stresses the processor ability to deal with unaligned data.
Skipped also bitset
benchmarks. They are harder to unalign, since they use some stack containers with deduced types. I'm confident that bitset
from/to string conversion are examples of algorithm that are not very sensitive to the alignment.
Some of recently added benchmark are not changed in this PR, because they already use not highly aligned allocator.
Also I expected replace_copy
as one of the sensitive, but auto-vectorization is broken at all in latest Preview 🐛.
Created DevCom-10895463
Drive-by: proper optimization barriers in replace family benchmarks.