Yet another run at reduce (original) (raw)
Brian Goetz brian.goetz at oracle.com
Mon Jan 7 17:31:42 PST 2013
- Previous message: Function type naming conventions
- Next message: Yet another run at reduce
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I think Doug and I made some progress on reduce forms today.
Recap: there are two ways to do a reduction to a mutable data structure like a Map; a classic fold (where you create lots of little maps, one per leaf of the computation tree, accumulate leaf data into them in a thread-confined manner, then merge their contents up the tree into one big map), and a concurrent bombardment (where you create one big concurrent-safe map, and blast elements in from many threads.) Call these "A" and "B".
The requirement for A is that:
- your combining functions are associative
- you can create multiple containers
- you can incorporate a new element into a container
- you can merge containers in a way that satisfies the obvious distributive requirement
If you meet these requirements, you can do a parallel A reduce that respects encounter order.
The requirement for B is that:
- your combining functions are associative and commutative
- you can incorporate a new element into a container
- your container support concurrent incorporation
If you meet these requirements, you can do a parallel B reduce that does NOT respect order, but may be faster (or slower, if contention is high enough.)
Key observation: A "B" reducer can become an "A" reducer simply by adding the merge ability, which is no harder than for regular A reducers.
So rather than have A reducers and B reducers, we can have A reducers and A+B reducers. It's only marginally more work for B reducer writers.
So...
public interface Reducer<T, R> { boolean isConcurrent(); R makeResult(); void accumulate(R result, T value); R combine(R result, R other); }
A reducers return 'false' for isConcurrent; B reducers return 'true'.
What was Concurrent{Reducer,Tabulator,Accumulator} goes away.
Now, in addition to the various forms of reduce(), we add overloaded:
reduce(Reducer) reduceUnordered(Reducer)
You will get a B reduction if (a) you selected reduceUnordered and (b) your reducer is concurrent. Otherwise you will get an A reduction.
This is nice because the knowledge of properties of "user doesn't care about order" and "target supports concurrent insertion" naturally live in different places; this separates them properly. The latter is a property of the reducer implementation; the former only the user knows about and therefore should live at the call site. Each contributes their bit and can mostly remain ignorant of the other; the library combines these bits, and if both are present, you get a B reduction.
The reduceUnordered() method can be cast as an optimization to reduce(); it is always correct to use reduce(), but may be faster to use reduceUnordered(), as long as you are willing to forgo order and the reducer is coooperative.
In neither case (assuming a properly implemented reducer) will you ever get a thread-unsafe result; if you ask for an unordered reduction with a nonconcurrent reducer, you get a safe ordered reduction instead, which might be slower (or faster.)
Pros:
- ConcurrentReducer goes away
- .unordered() goes away
- Properly separates order-sensitivity from concurrency
Cons:
- Two variations on various aspects of reducing are surfaced in the API (unordered and concurrent) that will make users say "huh?" (but if you get it wrong /choose to stay ignorant you still get the right answer.)
Implementation coming.
- Previous message: Function type naming conventions
- Next message: Yet another run at reduce
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the lambda-libs-spec-observers mailing list