Vectorized MSVC STL Algorithms (original) (raw)

Under specific conditions, algorithms in the MSVC Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar.

The conditions required for vectorization are:

Auto-vectorization in the MSVC STL

For more information about automatic vectorization, see Auto-Vectorizer and the discussion in that article about the /arch switch. This applies to the STL implementation code the same way it applies to user code.

Algorithms like transform, reduce, and accumulate benefit heavily from auto-vectorization.

Manual vectorization in the MSVC STL

Certain algorithms for x64 and x86 include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs.

Manually vectorized algorithms use template metaprogramming to detect if the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types.

Programs either benefit in performance from manual vectorization or remain unaffected by it. Disable manual vectorization by defining _USE_STD_VECTOR_ALGORITHMS=0 in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because _USE_STD_VECTOR_ALGORITHMS defaults to 1 on those platforms.

Assign the same value to _USE_STD_VECTOR_ALGORITHMS for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see /D (Preprocessor Definitions).

The _USE_STD_VECTOR_ALGORITHMS macro controls the behavior of these manually vectorized algorithms:

The _USE_STD_VECTOR_ALGORITHMS macro also controls the manual vectorization of:

Manually vectorized algorithms for floating point types

Vectorization of floating-point types involves specific considerations:

The STL addresses the first two considerations safely. Only max_element, min_element, minmax_element, max, min, minmax, is_sorted, and is_sorted_until are manually vectorized. These algorithms:

Use _USE_STD_VECTOR_FLOATING_ALGORITHMS to control the use of these vectorized algorithms for floating-point types. Set it to 0 to disable vectorization. _USE_STD_VECTOR_FLOATING_ALGORITHMS doesn't affect anything if _USE_STD_VECTOR_ALGORITHMS is set to 0.

The _USE_STD_VECTOR_FLOATING_ALGORITHMS macro defaults to 0 when /fp:except is set.

Assign the same value to _USE_STD_VECTOR_FLOATING_ALGORITHMS for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see /D (Preprocessor Definitions).

See also

Auto-Vectorizer