Into (original) (raw)
Brian Goetz brian.goetz at oracle.com
Wed Dec 26 10:38:54 PST 2012
- Previous message: Into
- Next message: Into
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Let's try to separate some things here.
There's lots of defending of into() because it is (a) useful and (b) safe. That's all good. But let's see if we can think of these more as functional requirements than as mandating a specific API (whether one that happens to be already implemented, like into(), or the newly proposed ones like toEveryKindOfCollection().)
Into as currently implemented has many negatives, including:
- Adds conceptual and API surface area -- destinations have to implement Destination, the semantics of into are weird and unique to into
- Will likely parallelize terribly
- Doesn't provide the user enough control over how the into'ing is done (seq vs par, order-sensitive vs not)
So let's step back and talk requirements.
I think the only clear functional requirement is that it should be easy to accumulate the result of a stream into a collection or similar container. It should be easy to customize what kind of collection, but also easy to say "give me a reasonable default." Additionally, the performance characteristics should be transparent; users should be able to figure out what's going to happen.
There are lots of other nice-to-haves, such as:
- Minimize impact on Collection implementations
- Minimize magic/guessing about the user's intent
- Support destinations that aren't collections
- Minimize tight coupling of Stream API to existing Collection APIs
The current into() fails on nearly all of these.
At the risk of being a broken record, there are really two cases here:
Reduce-like. Aggregate values into groups at the leaves of the tree, and then combine the groups somehow. This preserves encounter order, but has merging overhead. Merging overhead ranges from small (build a conc-tree) to large (add the elements of the right subresult individually to the left subresult) depending on the chosen data structure.
Foreach-like. Have each leaf shovel its values into a single shared concurrent container (imagine a ConcurrentVector class.) This ignores encounter order, but a well-written concurrent destination might be able to outperform the merging behavior.
In earlier implementations we tried to guess between the two modes based on the ordering charcteristics of the source and the order-preserving characteristics of the intermediate ops. This is both risky and harder for the user to control (hence hacks like .unordered()). I think this general approach is a loser for all but the most special cases.
Since we can't read the user's mind about whether they care about encounter order or not (e.g., they may use a List because there's no Multiset implementation handy), I think we need to provide ways of aggregating that let the user explicitly choose between order-preserving aggregation and concurrent aggregation. I think having the word "concurrent" in the code somewhere isn't a bad clue.
On 12/26/2012 1:02 PM, Remi Forax wrote:
On 12/26/2012 06:07 PM, Doug Lea wrote:
On 12/26/12 11:52, Remi Forax wrote:
No, I think it's better to have only toList() and toSet(), the result of stream.sorted().toSet() will return a NavigableSet/SortedSet. The idea is that the method to* will choose the best implementation using the property of the pipeline.
If you want a specific implementation, then use into(). Sorry, I still don't buy it. If you want a specific implementation, then my sense is that you will end up writing something like the following anyway: Stream s = ...; SomeCollection dest = ... // add s to dest via (par/seq) forEach or loop or whatever again, letting people to do the copy will create a lot of non thread safe codes. I see forEach is a necessary evil, not as something that people should use every days. so why bother adding all the support code that people will probably not use anyway in custom situations because, well, they are custom situations. So to me, Reducers etc are in the maybe-nice-to-have category. while I agree that custom reducers have to fly by themselves, we need to provide an operation that pull all elements from a parallel stream and put them in any collections in a thread safe manner that doesn't require 10 eyeballs to look at the code. -Doug Rémi
- Previous message: Into
- Next message: Into
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the lambda-libs-spec-experts mailing list