Benchmark: use `not_highly_aligned_allocator` in more places by AlexGuteniev · Pull Request #5443 · microsoft/STL (original) (raw)

Resolves #5035

To avoid trying both aligned and unaligned allocators, try just unaligned. This makes sure we're checking the worst case, where vectorization would be of less benefit.

Only change the container for potentially-vectorized algorithm benchmark. For, like, random it does not make sense. Also vector<bool> is vectorized, if we consider GPR-based vectorization as still vectorization, but this will not be sensitive to the alignment.

Some vector algorithms that are potentially sensitive to the alignment are in fact sensitive to alignment, some are not or almost not. The ones that are sensitive are simplest searches, or data movement. Consider adjacent_find as a good example of the sensitive one, here's how the results became worse:

Benchmark	Before	After
bm<AlgType::Std, int8_t>/2525/1142	19.9 ns	22.1 ns
bm<AlgType::Std, int16_t>/2525/1142	33.2 ns	51.5 ns
bm<AlgType::Std, int32_t>/2525/1142	75.5 ns	89.1 ns
bm<AlgType::Std, int64_t>/2525/1142	139 ns	163 ns
bm<AlgType::Rng, int8_t>/2525/1142	16.7 ns	20.1 ns
bm<AlgType::Rng, int16_t>/2525/1142	33.0 ns	50.5 ns
bm<AlgType::Rng, int32_t>/2525/1142	76.9 ns	89.0 ns
bm<AlgType::Rng, int64_t>/2525/1142	141 ns	163 ns

adjacent_find does two AVX loads of the same input data at a time, with aligned allocator just one of the loads is unaligned, with unaligned allocator both of them are unaligned. So it stresses the processor ability to deal with unaligned data.

Skipped also bitset benchmarks. They are harder to unalign, since they use some stack containers with deduced types. I'm confident that bitset from/to string conversion are examples of algorithm that are not very sensitive to the alignment.

Some of recently added benchmark are not changed in this PR, because they already use not highly aligned allocator.

Also I expected replace_copy as one of the sensitive, but auto-vectorization is broken at all in latest Preview 🐛.
Created DevCom-10895463

Drive-by: proper optimization barriers in replace family benchmarks.

Benchmark: use not_highly_aligned_allocator in more places by AlexGuteniev · Pull Request #5443 · microsoft/STL (original) (raw)

Benchmark: use `not_highly_aligned_allocator` in more places by AlexGuteniev · Pull Request #5443 · microsoft/STL (original) (raw)