Add algorithm-specific detection macros for vectorization by StephanTLavavej · Pull Request #5801 · microsoft/STL (original) (raw)
No functional change for users; this keeps _USE_STD_VECTOR_ALGORITHMS as our documented control macro.
To prepare for adding ARM64 and eventually ARM64EC vectorized implementations, this begins by adding a layer of architecture-specific detection macros. (This layer isn't strictly necessary, but it makes life easier because it (1) allows the control macro to shut everything down, (2) deals with how ARM64EC defines both _M_ARM64EC and _M_X64, and (3) centralizes how x64/x86 vectorization is always paired.)
Then this introduces algorithm-specific detection macros. This makes connections throughout the codebase easier to understand, because a given vectorized algorithm needs declarations of separately compiled functions (__std_meow_4), a wrapper template (_Meow_vectorized), callsites, and occasionally helper type traits (_Vector_alg_in_meow_is_safe). This will also make it significantly easier to implement algorithms one at a time for ARM64, since this PR is decomposing what was previously a monolithic mode.
The _VECTORIZED_MEOW macros are named to avoid confusion with any control macros (we usually use the _HAS_BLAH or _USE_BLAH patterns for control). They're deliberately the mirror image of the _Meow_vectorized wrappers, to further avoid confusion.
In simple cases, the macros are 1:1 with the wrapper templates. However, there are several "algorithm families" implemented in vector_algorithms.cpp, like the minmax and minmax_element families and the various find variants. I tried to introduce macros that corresponded to these algorithm families, since that's how they'll be implemented (and there would be no benefit to introducing an even finer-grained system).
There are a couple of places where I had to slightly adjust if constexpr logic to handle the minmax and minmax_element families potentially being variously active.
Finally, while removing the old monolithic macro, I allowed the code to expand to empty extern "C" { ... } blocks when no vectorization is active, because this is harmless and avoids cluttering up the code with even more verbosity.