Reducing reduce (original) (raw)

Brian Goetz brian.goetz at oracle.com
Mon Feb 18 12:20:23 PST 2013


Circling back to this (i.e., "reducing reduce", redux):

There are a lot of considerations here, many mostly accidental (e.g., consequences of erasure and the primitive/reference divide).

The three-arg functional reduce form is functionally equivalent to the two-arg form, except that there are some constructions that are more efficient to handle in the three arg form. However, the best example we came up with, Joe's string compare, suffers because he had to use boxing. So we're currently in a place where the best example to support this form has other defects that make the form hard to support. And, any form of functional reduce on a reference would likely result in a lot of object creation, so the optimization of eliding some of the mapping would have to overcome that.

Further, one can still handle this without boxing using collect() and an explicit mutable result holder. On the other hand, if/when the language acquires tuples, it will be a very different story, and this form would become infinitely more useful.

So I think the evidence weighs slightly in favor of ditching this form for now (though I'd feel better if people didn't have to use either an ad-hoc class or a single-element array as the data box when using the collect() form.)

Secondarily, ditching the three-arg form from Stream would remove one element of support for naming reduce and collect differently; part of the motivation for a different name was that the three-arg collect and three-arg reduce overloaded very poorly if they had the same name. However, I think we should resist the temptation to act on this. I think (a) there is pedagogical value in separating function and mutable reduce forms, and (b) if we do this, we slam the door on the more flexible version, which will badly bite us in a tupled future.

We might still consider the three-arg version for IntStream. That's the case where Joe's example works.

On 2/11/2013 11:41 AM, Brian Goetz wrote:

Now that we've added all the shapes of map() to Stream (map to ref/int/long/double), and we've separated functional reduce (currently called reduce) from mutable reduce (currently called collect), I think that leaves room for taking out one of the reduce methods from Stream:

U reduce(U identity, BiFunction<U, ? super T, U> accumulator, BinaryOperator reducer); This is the one that confuses everyone anyway, and I don't think we need it any more. The argument for having this form instead of discrete map+reduce are: - fused map+reduce reduces boxing - this three-arg form can also fold filtering into the accumulation However, since we now have primitive-bearing map methods, and we can do filtering before and after the map, is this form really carrying its weight? Specifically because people find it counterintuitive, we should consider dropping it and guiding people towards map+reduce. For example, "sum of pages" over a stream of Documents is better written as: docs.map(Document::getPageCount).sum() rather than docs.reduce(0, (count, d) -> count + d.getPageCount(), Integer::sum) The big place where we need three-arg reduce is when we're folding into a mutable store. But that's now handled by collect(). Have I missed any use cases that would justify keeping this form?



More information about the lambda-libs-spec-observers mailing list