[algorithms.parallel] (original) (raw)

26 Algorithms library [algorithms]

26.3.1 Preamble [algorithms.parallel.defns]

26.3.2 Requirements on user-provided function objects [algorithms.parallel.user]

26.3.3 Effect of execution policies on algorithm execution [algorithms.parallel.exec]

26.3.4 Parallel algorithm exceptions [algorithms.parallel.exceptions]

26.3.5 Parallel algorithm overloads [algorithms.parallel.overloads]

26.3.6 Execution policies [execpol]

26.3.6.1 General [execpol.general]

26.3.6.2 Execution policy type trait [execpol.type]

26.3.6.3 Sequenced execution policy [execpol.seq]

26.3.6.4 Parallel execution policy [execpol.par]

26.3.6.5 Parallel and unsequenced execution policy [execpol.parunseq]

26.3.6.6 Unsequenced execution policy [execpol.unseq]

26.3.6.7 Execution policy objects [execpol.objects]

26.3.1 Preamble [algorithms.parallel.defns]

Subclause [algorithms.parallel] describes components that C++ programs may use to perform operations on containers and other sequences in parallel.

A parallel algorithm is a function template listed in this document with a template parameter named ExecutionPolicyor constrained by the following exposition-only concept:template<class Ep> concept execution-policy = is_execution_policy_v<remove_cvref_t<Ep>>;

Such a template parameter is termed an execution policy template parameter.

A parallel algorithm accesses objects indirectly accessible via its arguments by invoking the following functions:

All operations of the categories of the iterators, sentinels, or mdspan types that the algorithm is instantiated with.
Operations on those sequence elements that are required by its specification.
User-provided invocable objects to be applied during the execution of the algorithm, if required by the specification.
Operations on those invocable objects required by the specification.

These functions are herein called element access functions.

[Example 1:

The sort function may invoke the following element access functions:

Operations of the random-access iterator of the actual template argument (as per [random.access.iterators]), as implied by the name of the template parameter RandomAccessIterator.
The swap function on the elements of the sequence (as per the preconditions specified in [sort]).
The user-provided Compare function object.

— _end example_]

A standard library function is vectorization-unsafeif it is specified to synchronize with another function invocation, or another function invocation is specified to synchronize with it, and if it is not a memory allocation or deallocation function or lock-free atomic modify-write operation ([atomics.order]).

[Note 2:

Implementations must ensure that internal synchronization inside standard library functions does not prevent forward progress when those functions are executed by threads of execution with weakly parallel forward progress guarantees.

— _end note_]

[Example 2: int x = 0; std::mutex m;void f() { int a[] = {1,2}; std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int) { std::lock_guard<mutex> guard(m); ++x;});}

The above program may result in two consecutive calls to m.lock()on the same thread of execution (which may deadlock), because the applications of the function object are not guaranteed to run on different threads of execution.

— _end example_]

26.3.2 Requirements on user-provided function objects [algorithms.parallel.user]

Unless otherwise specified, invocable objects passed into parallel algorithms as objects of a type denoted by a template parameter namedPredicate,BinaryPredicate,Compare,UnaryOperation,BinaryOperation,BinaryOperation1,BinaryOperation2,BinaryDivideOp, or constrained by a concept that subsumes regular_invocableand the operators used by the analogous overloads to these parallel algorithms that are formed by an invocation with the specified default predicate or operation (where applicable) shall not directly or indirectly modify objects via their arguments, nor shall they rely on the identity of the provided objects.

26.3.3 Effect of execution policies on algorithm execution [algorithms.parallel.exec]

An execution policy template parameter describes the manner in which the execution of a parallel algorithm may be parallelized and the manner in which it applies the element access functions.

If an object is modified by an element access function, the algorithm will perform no other unsynchronized accesses to that object.

The modifying element access functions are those which are specified as modifying the object.

[Note 1:

For example,swap,++,--,@=, and assignments modify the object.

For the assignment and @= operators, only the left argument is modified.

— _end note_]

Unless otherwise stated, implementations may make arbitrary copies of elements (with type T) from sequences where is_trivially_copy_constructible_v<T>and is_trivially_destructible_v<T> are true.

[Note 2:

This implies that user-supplied function objects cannot rely on object identity of arguments for such input sequences.

If object identity of the arguments to these function objects is important, a wrapping iterator that returns a non-copied implementation object such as reference_wrapper<T> ([refwrap]), or some equivalent solution, can be used.

— _end note_]

The invocations of element access functions in parallel algorithms invoked with an execution policy object of type execution::sequenced_policy all occur in the calling thread of execution.

The invocations of element access functions in parallel algorithms invoked with an execution policy object of type execution::unsequenced_policyare permitted to execute in an unordered fashion in the calling thread of execution, unsequenced with respect to one another in the calling thread of execution.

[Note 4:

This means that multiple function object invocations can be interleaved on a single thread of execution, which overrides the usual guarantee from [intro.execution]that function executions do not overlap with one another.

— _end note_]

The behavior of a program is undefined if it invokes a vectorization-unsafe standard library function from user code called from an execution::unsequenced_policy algorithm.

[Note 5:

Because execution::unsequenced_policy allows the execution of element access functions to be interleaved on a single thread of execution, blocking synchronization, including the use of mutexes, risks deadlock.

— _end note_]

The invocations of element access functions in parallel algorithms invoked with an execution policy object of type execution::parallel_policyare permitted to execute either in the invoking thread of execution or in a thread of execution implicitly created by the library to support parallel algorithm execution.

If the threads of execution created by thread ([thread.thread.class]) or jthread ([thread.jthread.class]) provide concurrent forward progress guarantees ([intro.progress]), then a thread of execution implicitly created by the library will provide parallel forward progress guarantees; otherwise, the provided forward progress guarantee isimplementation-defined.

Any such invocations executing in the same thread of execution are indeterminately sequenced with respect to each other.

[Note 6:

It is the caller's responsibility to ensure that the invocation does not introduce data races or deadlocks.

— _end note_]

[Example 1: int a[] = {0,1}; std::vector<int> v; std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int i) { v.push_back(i*2+1); });

The program above has a data race because of the unsynchronized access to the container v.

— _end example_]

[Example 2: std::atomic<int> x{0};int a[] = {1,2}; std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) { x.fetch_add(1, std::memory_order::relaxed);while (x.load(std::memory_order::relaxed) == 1) { } });

The above example depends on the order of execution of the iterations, and will not terminate if both iterations are executed sequentially on the same thread of execution.

— _end example_]

[Example 3: int x = 0; std::mutex m;int a[] = {1,2}; std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) { std::lock_guard<mutex> guard(m);++x;});

The above example synchronizes access to object xensuring that it is incremented correctly.

— _end example_]

The invocations of element access functions in parallel algorithms invoked with an execution policy object of type execution::parallel_unsequenced_policy are permitted to execute in an unordered fashion in unspecified threads of execution, and unsequenced with respect to one another within each thread of execution.

These threads of execution are either the invoking thread of execution or threads of execution implicitly created by the library; the latter will provide weakly parallel forward progress guarantees.

[Note 7:

— _end note_]

The behavior of a program is undefined if it invokes a vectorization-unsafe standard library function from user code called from an execution::parallel_unsequenced_policy algorithm.

[Note 8:

Because execution::parallel_unsequenced_policy allows the execution of element access functions to be interleaved on a single thread of execution, blocking synchronization, including the use of mutexes, risks deadlock.

— _end note_]

[Note 9:

The semantics of invocation withexecution::unsequenced_policy,execution::parallel_policy, orexecution::parallel_unsequenced_policyallow the implementation to fall back to sequential execution if the system cannot parallelize an algorithm invocation, e.g., due to lack of resources.

— _end note_]

If an invocation of a parallel algorithm uses threads of execution implicitly created by the library, then the invoking thread of execution will either

temporarily block with forward progress guarantee delegation ([intro.progress]) on the completion of these library-managed threads of execution, or
eventually execute an element access function;

the thread of execution will continue to do so until the algorithm is finished.

[Note 10:

In blocking with forward progress guarantee delegation in this context, a thread of execution created by the library is considered to have finished execution as soon as it has finished the execution of the particular element access function that the invoking thread of execution logically depends on.

— _end note_]

The semantics of parallel algorithms invoked with an execution policy object ofimplementation-defined type are implementation-defined.

26.3.4 Parallel algorithm exceptions [algorithms.parallel.exceptions]

During the execution of a parallel algorithm, if temporary memory resources are required for parallelization and none are available, the algorithm throws a bad_alloc exception.

During the execution of a parallel algorithm, if the invocation of an element access function exits via an uncaught exception, the behavior is determined by the execution policy.

26.3.5 Parallel algorithm overloads [algorithms.parallel.overloads]

Parallel algorithms are algorithm overloads.

Each parallel algorithm overload has an additional function parameter P of type T&&as the first function parameter, where T is the execution policy template parameter.

[Note 1:

Not all algorithms have parallel algorithm overloads.

— _end note_]

Unless otherwise specified, the semantics of calling a parallel algorithm overload are identical to calling the corresponding algorithm overload without the parameter P, using all but the first argument.

Unless otherwise specified, the complexity requirements of a parallel algorithm overload are relaxed from the complexity requirements of the corresponding overload without the parameter Pas follows: when the guarantee says “at most _expr_” or “exactly expr_” and does not specify the number of assignments or swaps, and_expr is not already expressed with notation, the complexity of the algorithm shall be .

A parallel algorithm with a template parameter named ExecutionPolicyshall not participate in overload resolution unless that template parameter satisfies execution-policy.

26.3.6 Execution policies [execpol]

26.3.6.1 General [execpol.general]

Subclause [execpol] describes classes that are execution policy types.

An object of an execution policy type indicates the kinds of parallelism allowed in the execution of an algorithm and expresses the consequent requirements on the element access functions.

Execution policy types are declared in header .

[Example 1: using namespace std; vector<int> v = ; sort(v.begin(), v.end()); sort(execution::seq, v.begin(), v.end()); sort(execution::par, v.begin(), v.end()); sort(execution::par_unseq, v.begin(), v.end()); — _end example_]

[Note 1:

Implementations can provide additional execution policies to those described in this document as extensions to address parallel architectures that require idiosyncratic parameters for efficient execution.

— _end note_]

26.3.6.2 Execution policy type trait [execpol.type]

template<class T> struct is_execution_policy;

is_execution_policy can be used to detect execution policies for the purpose of excluding function signatures from otherwise ambiguous overload resolution participation.

is_execution_policy<T> is a Cpp17UnaryTypeTrait with a base characteristic of true_type if T is the type of a standard or implementation-defined execution policy, otherwise false_type.

[Note 1:

This provision reserves the privilege of creating non-standard execution policies to the library implementation.

— _end note_]

The behavior of a program that adds specializations foris_execution_policy is undefined.

26.3.6.3 Sequenced execution policy [execpol.seq]

class execution::sequenced_policy { _unspecified_ };

The class execution::sequenced_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm's execution may not be parallelized.

During the execution of a parallel algorithm with the execution::sequenced_policy policy, if the invocation of an element access function exits via an exception,terminate is invoked ([except.terminate]).

26.3.6.4 Parallel execution policy [execpol.par]

class execution::parallel_policy { _unspecified_ };

The class execution::parallel_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized.

During the execution of a parallel algorithm with the execution::parallel_policy policy, if the invocation of an element access function exits via an exception,terminate is invoked ([except.terminate]).

26.3.6.5 Parallel and unsequenced execution policy [execpol.parunseq]

class execution::parallel_unsequenced_policy { _unspecified_ };

The class execution::parallel_unsequenced_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized and vectorized.

During the execution of a parallel algorithm with the execution::parallel_unsequenced_policy policy, if the invocation of an element access function exits via an exception,terminate is invoked ([except.terminate]).

26.3.6.6 Unsequenced execution policy [execpol.unseq]

class execution::unsequenced_policy { _unspecified_ };

The class unsequenced_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized, e.g., executed on a single thread using instructions that operate on multiple data items.

During the execution of a parallel algorithm with the execution::unsequenced_policy policy, if the invocation of an element access function exits via an exception,terminate is invoked ([except.terminate]).

26.3.6.7 Execution policy objects [execpol.objects]

inline constexpr execution::sequenced_policy execution::seq{ _unspecified_ };inline constexpr execution::parallel_policy execution::par{ _unspecified_ };inline constexpr execution::parallel_unsequenced_policy execution::par_unseq{ _unspecified_ };inline constexpr execution::unsequenced_policy execution::unseq{ _unspecified_ };

The header declares global objects associated with each type of execution policy.