Encounter order: take 2 (original) (raw)

Paul Sandoz paul.sandoz at oracle.com
Fri Feb 1 01:22:02 PST 2013


On Feb 1, 2013, at 12:27 AM, Mike Duigou <mike.duigou at oracle.com> wrote:

An intermediate operation may clear encounter order so that the output stream and corresponding input stream to the next intermediate operation or terminal operation does not have encounter order. There are no such operations implemented. (Previously the unordered() operation cleared encounter order.) Otherwise an intermediate operation must preserve encounter order if required to do so (see next paragraphs). An intermediate operation may choose to apply a different algorithm if encounter order of input stream must be preserved or not. The distinct() operation will, when evaluating in parallel, use a ConcurrentHashMap to store unique elements if encounter order does not need to be preserved, otherwise if encounter order needs to be preserved a fold will be performed (equivalent of, in parallel, map each element to a singleton linked set then associatively reduce, left-to-right, the linked sets to one linked set). Without unordered() how is the CHM version accessed if the source is an ArrayList?

If the source has encounter order the distinct operation will choose whether to preserve encounter order or not as per clause b.2. i.e. the properties of the terminal operation are a factor.

Implementation wise the op checks if the ORDERED flag is on the bit set of flags passed to it so it is not as complicated as it sounds.

An intermediate operation must preserve encounter order of output stream if:

a.1) the input stream to the intermediate operation has encounter order (either because the stream source has encounter order or because a previous intermediate operation injects encounter order); and a.2) the terminal operation preserves encounter order. An intermediate operation may not preserve encounter order of the output stream if: b.1) the input stream to the intermediate operation does not have encounter order (either because the stream source does not have encounter order or because a previous intermediate operation clears encounter order); or b.2) the terminal operation does not preserve encounter order and the intermediate operation is in a sequence of operations, to be computed, where the last operation in the sequence is the terminal operation and all operations in the sequence are computed in parallel. Rule b.2 above ensures that encounter order is preserved for the following pipeline on the sequential().forEach(): list.parallelStream().distinct().sequential().forEach() i.e. the distinct() intermediate operation will preserve the encounter order of the list stream source. I find this result surprising (and disappointing). Users are going to be surprised by the poor performance of using parallelStream in this case.

The sequential() op is currently implemented as a full barrier to ensure elements are reported sequentially downstream in the same thread that created the stream.

The following will produce the same output:

list.stream().distinct().forEach(...) list.parallelStream().distinct().sequential().forEach(...)

Which i think conforms to the principle of least surprise.

If performance is a concern and order is not then one should not use sequential e.g. do:

list.parallelStream().distinct().forEach(e -> { synchronizied(this) { ... } } )

or a concurrent collect.

Paul.



More information about the lambda-libs-spec-observers mailing list