Constructing parallel streams (original) (raw)

Brian Goetz brian.goetz at oracle.com
Mon Dec 10 08:08:18 PST 2012


The only reason is that it may not perform as well as the user expects.

The reason for this is that one of the big performance tricks we use is "jamming". When you do

foos.filter(...).map(...).reduce(...)

we can do the filtering, mapping, and reducing in a single pass (serial or parallel.) If you do

foos.sequential().filter(...).parallel().map(...).sequential().reduce(...)

then you may be introducing "barriers" in the computation, where something has to stop and collect the results before proceeding. This is giving up a lot of the performance benefit of the streams model. (Stateful ops, like sorting or limit, generally have a similar effect.)

However, since we don't know anything about what the user is doing in those lambdas, it is conceivable that it is still a win.

We do elide sequential/parallel calls if the stream already has that orientation (e.g., parallel on an already parallel stream is a no-op.)

Overall I'm mostly in the "don't try to save the user from themselves" camp here. We should document how the model works and let performance-sensitive users measure for themselves. So while it is most effective to put the parallel() at the head of the pipe, my distaste for having it in the middle is merely mild and overall I can live with it.

On 12/10/2012 11:01 AM, Joe Bowbeer wrote:

I can easily imagine a pipeline that has alternating sequential/parallel/sequential segments. Is there any reason to discourage a programmer from using the parallel/sequential methods to express this?

On Dec 10, 2012 7:50 AM, "Brian Goetz" <brian.goetz at oracle.com_ _<mailto:brian.goetz at oracle.com>> wrote: I don't like users being able to call parallel in the middle of the stream construction.

I don't love it either. The semantics are perfectly tractible, and the implementation is perfectly straightforward, but the performance is unlikely to be a win in most cases. (I mentioned earlier we would doc that this really should only be done at the head of the pipeline.) I propose to have an interface ParallelizableStream that allows to choose if the user want the sequential or the parallel stream upfront. Yeah, we investigated this direction first. Combinatorial explosion: IntParallelizableStream, etc. However, this could trivially become a dynamic property of streams (fits easily into the existing stream flags mechanism). Then only the head streams would have the property, and if you tried to do parallel() farther down the stream, we could ignore it or even throw ISE.



More information about the lambda-libs-spec-experts mailing list