Forms for reduce() -- part 1 (original) (raw)
Brian Goetz brian.goetz at oracle.com
Mon Dec 10 12:54:05 PST 2012
- Previous message: Combiner -> BiFunction?
- Next message: Forms for reduce() -- part 1
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have been doing some brainstorming on forms for "fold". My primary goals for revisiting this include:
As mentioned in an earlier note, I want to get Map and Collection out of the Streams API (groupBy and reduceBy currently intrude these). This message lays the groundwork for this and I will follow up on these in a separate note. As I noted, there are many things currently wrong with the current groupBy/reduceBy that I want to fix.
Support "mutable fold" cases better, where the "seed" is really a mutable container (like a StringBuffer.)
I'll start with use cases. There are some that fit purely into a traditional functional model, and others that fit better into a mutable model. While one can wedge one into the other, I think it may be better to be explicit about both.
I am not suggesting naming right now, they could all be called reduce, though we may want to use different names to describe the functional vs mutable cases.
Use cases -- purely functional
- Homogeneous operations on monoid (e.g., sum). Here, there is a monoid with a known zero.
T reduce(T zero, BinaryOperator reducer)
- Homogeneous operations on non-monoids (e.g., min). Here, there is no sensible zero, so we use Optional to reflect "nothing there". Ideally we would like to delay boxing to Optional until the very last operation (in other words, use (boolean, T) as the internal state and box to Optional at the very end.)
Optional reduce(BinaryOperator reducer)
- Nonhomogeneous operations (aka foldl, such as "sum of weights"). This requires an additional combiner function for this to work in parallel.
U reduce(U zero, (U,T) -> U reducer, (U,U -> U) combiner) Optional reduce(T->U first, (U,T) -> U reducer, (U,U -> U) combiner)
Note that most cases where we might be inclined to return Optional can be written as stream.map(T->U).reduce(BinaryOperator).
Doug points out: if we went with "null means nothing", we wouldn't need the optional forms.
This is basically what we have now, though we're currently calling the last form "fold". Doug has suggested we call them all reduce.
Sub-question: people are constantly pointing out "but you don't need the combiner for the serial case." My orientation here is that the serial case is a special case, and while we want to ensure that those cases are well-served, we don't necessarily want to distort the API to include things that only work in the serial case.
Use cases -- mutable
Many fold-like operations are better expressed with mutable state. We could easily simulate them with the foldl form, but it may well be better to call this form out specially. In these cases, there is also often a distinct internal and external representation. I'll give them the deliberately stupid name mReduce for now.
The general form is:
<I, R> mReduce(Supplier makeEmpty, BiBlock<I, T> addElement, BiBlock<I, I> combineResults, Function<I, E> getFinalResult)
Here, I is the intermediate form, and E is the result. There are many cases where computations with an intermediate form is more efficient, so we want to maintain the intermediate form for as long as possible -- ideally until the last possible minute (when the whole reduction is done.)
The analogue of reducer/combiner in the functional forms is "accept a new element" (addElement) and "combine one intermediate form with another" (combineResults).
Examples:
- Average. Here, we use an array of two ints to hold length and count. (Alternately we could use a custom tuple class.) Our intermediate form is int[2] and our final form is Double.
Double average = integers.mReduce(() -> new int[2], (a, i) -> { a[0] += i; a[1]++ }, (a, b) -> { a[0] += b[0]; a[1] += b[1] }, a -> (double) a[0] / a[1]);
Here, we maintain the int[2] form all the way throughout the computation, including as we combine up the tree, and only convert to double at the last minute.
- String concatenation
The signatures of the SAMs in mReduce were chosen to work with existing builder-y classes such as StringBuffer or ArrayList. We can do string concatenation using the functional form using String::concat, but it is inefficient -- lots of copying as we go up the tree. We can still use a mutable fold to do a concatenation with StringBuilder and mReduce. It has the nice property that all the arguments already have methods that have the right signature, so we can do it all with method refs.
String s = strings.mReduce(StringBuilder::new, StringBuilder::append, StringBuilder::append, StringBuilder::toString);
In this example, the two append method refs are targeting different versions of StringBuilder.append; the first is append(String) and the second is append(StringBuilder). But the compiler will figure this out.
- toArray
We can express "toArray" as a mutable fold using ArrayList to accumulate values and converting to an array at the end, just as with StringBuilder:
Object[] array = foos.reduce(ArrayList::new, ArrayList::add, ArrayList::addAll, ArrayList::toArray);
There are other mutable reduction use cases too. For example, sort can be implemented by providing a "insert in order" and a "merge sorted lists" method. While these are not necessarily the most efficient implementation, they may well make reasonable last-ditch defaults.
Both of these examples use separate internal forms (StringBuffer, ArrayList) and external forms (String, array).
Finally, for reasons that may become clearer in the next message, I think we should consider having an abstraction for "Reducer" or "Reduction" that captures all the bits needed for a reduction. This would allow the averager above to be reused:
double average = integers.reduce(Reducers.INT_AVERAGER);
This turns into a win when we try to recast groupBy/reduceBy into being general reductions (next message).
So, summary:
Functional forms:
public U reduce(final U seed, final BinaryOperator<U> op) {
public Optional<U> reduce(BinaryOperator<U> op) {
public <R> R reduce(R base, Combiner<R, U, R> reducer,
BinaryOperator combiner) {
Mutable form:
public <I, R> R reduce(Supplier<I> baseFactory,
BiBlock<I, U> reducer,
BiBlock<I, I> combiner,
Function<I, R> finalResultMapper) {
(and possibly a mutable form for special case where I=R)
Possibly a form for a canned Reducer:
public<R> R reduce(Reducer<T,R> reducer);
- Previous message: Combiner -> BiFunction?
- Next message: Forms for reduce() -- part 1
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the lambda-libs-spec-experts mailing list