parallel() (original) (raw)

stream() / parallel()

Remi Forax forax at univ-mlv.fr
Tue Nov 27 14:39:43 PST 2012


On 11/27/2012 07:39 PM, Brian Goetz wrote:

The "bun" operations of stream() and parallel() on Collections and friends (which currently live in Streamable, but I am not sure that Streamable carries its weight) work acceptably well for collections:

c.stream().ops(...) and c.parallel().ops(...) The same thing works OK with the stream factories in Streams (which is intended for use by libraries, not end-users): Stream.stream(spliterator, flags) Stream.parallel(spliterator, flags) Arrays.stream(array) Arrays.parallel(array) But we start to see friction in the other places where we surface streams: IntStream String.chars() IntStream intRange(int start, int limit) Stream BufferedReader.lines() Stream Streams.iterate(T seed, UnaryOperator op) All of these are, currently, sequential streams. But it seems an unfortunate limitation that they are forced to be sequential. Duplicating these methods with parallel versions (i.e., String.charsAsParallelStream) is possible but seems kind of yucky.

Recent improvements in the Stream implementation (including the strictures against "re-use") have created another possibility: an explicit parallel() "op" on Streams. If this appears at the head of the stream, the performance cost is almost zero (a few extra garbage objects at setup time), and on a stream that is already parallel, it would be a no-op ("if (isParallel) return this"). At the use site, it would look like: String.chars().parallel().forEach(...); or int[] mapped = Streams.intRange(0, 1000000).parallel().map(...).toArray(); Collections and other things that currently implement Streamable could additionally offer a convenience method to fuse stream+parallel, which is both slightly more compact and slightly more efficient: c.stream()... c.parallelStream()... (The goal here is not to encourage fine-grained "transitions" between parallel and sequential; this is likely to be a performance loss.)

The main issue is that you force all API developpers that want to implement a non parallel stream to have a good answer when user will call parallel(). Given that not all sequential stream can be parallelized, what is the semantics of Stream.parallel().

Rémi



More information about the lambda-libs-spec-observers mailing list