STL (original) (raw)

Towards #625, specifically #625 (comment) items 1 and 2.

🦖 Optimization

When a standard functor, either transparent or integer-specialized, is passed to transform, along with all vector<bool> iterators, map that functor to a bitwise one to operate on the underlying type.

The mapping is done via template specialization, and not via if constexpr to make the dispatch working fine without <functional> included and functors defined.

Only do this for zero offset. Supporting all possible offset combination is much complexity for a little gain. Remember copy.

Extract pointers from iterators to help the compiler auto-vectorize. Yes, it does not auto-vectorize when using the whole iterators. Auto-vecotrization needs simplest ways of implementing loops.

Don't call transform again, to avoid unnecessary recursion, the operation is simple.

~~Don't process tails explicitly, yield to the existing loop for now.~~
Actually lets go for it, it is not that hard. Process tails with applying bit mask.

Don't do ranges yet. Other vector<bool> optimizations don't do them either. It is getting complicated, so instead of doing ranges separately, need to look into #1754 at last.

🏁 Benchmark

Feed the randomizer with some seed to make the inputs different 🐦

Since (auto-)vectorization is (expected to be) engaged, use alignment controlling allocator.

⏱️ Benchmark results

Benchmark	Before	After	Speedup
transform_two_inputs_aligned<logical_and<>>/64	108 ns	2.55 ns	42.4
transform_two_inputs_aligned<logical_and<>>/4096	13869 ns	9.44 ns	1470
transform_two_inputs_aligned<logical_and<>>/65536	416424 ns	115 ns	3620
transform_two_inputs_aligned<logical_or<>>/64	123 ns	2.59 ns	47.40
transform_two_inputs_aligned<logical_or<>>/4096	14377 ns	9.07 ns	1590
transform_two_inputs_aligned<logical_or<>>/65536	409012 ns	112 ns	3650
transform_one_input_aligned<logical_not<>>/64	83.7 ns	2.14 ns	39.10
transform_one_input_aligned<logical_not<>>/4096	6891 ns	7.28 ns	947
transform_one_input_aligned<logical_not<>>/65536	264957 ns	82.7 ns	3200

Optimize std::transform for vector<bool> by AlexGuteniev · Pull Request #5769 · microsoft/STL (original) (raw)

🦖 Optimization

🏁 Benchmark

⏱️ Benchmark results

Optimize `std::transform` for `vector<bool>` by AlexGuteniev · Pull Request #5769 · microsoft/STL (original) (raw)