Vectorize `unique` by AlexGuteniev · Pull Request #5092 · microsoft/STL (original) (raw)

I've discovered that something is missing.
But I think it can wait for a follow up PR.

Unlike remove algorithm, this one doesn't have the search for the first duplicate before the main vectorization loop. The scalar implementations in headers have that part, the vectorized one currently doesn't.

For performance it is clearly a missed opportunity. Though the vectorization improvement should be bigger than the negative effect of extra writes.

For correctness, I'm not sure. [algorithms.requirements]/3 says:

For purposes of determining the existence of data races, algorithms shall not modify objects referenced through an iterator argument unless the specification requires such modification.

However as the writes write equal integer values, this is not observable for concurrent reads, even if container violates alignment requirements and the write is not atomic.

The only thing where extra writes can be observable is running this algorithm on a read-only data without adjacent duplicates. But this is a very silly use case.

It is easily fixable with adjacent_find, vectorized in another PR here.

Vectorize unique by AlexGuteniev · Pull Request #5092 · microsoft/STL (original) (raw)

Vectorize `unique` by AlexGuteniev · Pull Request #5092 · microsoft/STL (original) (raw)