Spliterator (original) (raw)

Brian Goetz brian.goetz at oracle.com
Mon Nov 19 14:11:59 PST 2012


For example, getSizeIfKnown(). Is it required to promise that the size will not change after return? Is it required to provide a result if one is computationally possible rather than just handy?

This is in part a question for whoever is on the other side of the Spliterator rather than for Spliterator itself. For the Streams library, we have strong non-interference requirements, so it is assuming that the thing being described the Spliterator will not change in size during processing. On the other hand, if you are using Spliterator within CHM, you are likely willing to tolerate changes in size. Either can be supported by the Spliterator API itself.

Is it required to lie and return Integer.MAXVALUE if it overflows int?

Good question, and related to the concurrent discussion about Sized. If the Sized/LongSized strategy seems viable, it becomes more practical to make all the sizes here long. I'd prefer to make all the sizes long, but only if we can make some progress on a migration path from 32 to 64 bit collections.

And I'm left wondering (as always) just how much good these methods will do compared to an ultra-small/simple API, considering that most collection developers would rather put more time into writing custom versions of forEach, reduce, etc rather than tuning their Spliterator hinting methods so that the lambda-libs default versions work well?

I guess I didn't think these would be burdensome to write? I have a hard time seeing how implementing getNaturalSplits would require more than a minute of thought. Which ones are you concerned about?

Or, considering that each of these added methods seems targeted at a particular data structure/shape (array, tree, bounded list/seq), why not make subinterfaces for them?

interface Spliterator interface SizedSpliterator extends Spliterator... interface RandomAccessSpliterator extends SizedSpliterator ... etc On a little thought, I can't think of a reason not to do this? Can you?

Yeah, we actually went down a road very much like that and then retreated. If you compare the current API to the Iteration 2 draft we had in the summer, we were contending with "modifier explosion": Stream, ParallelStream, SizedStream, SizedParallelStream -- and that was just the beginning. When primitive specializations come in, you get Int/Long/Double variants of each; when we had MapStream, you had bi-value variants of each. So we've very aggressively pared back the number of stream-related abstractions -- there's one stream, one sink, there's one spliterator (StreamAccessor is gone), etc, some of which unfortunately have to be specialized for primitive type. But at least now the only axis of proliferation is element type.

(Where to get this started, the minimal base version of Spliterator has a split() method returning one with fewer elements or null if it can't, and either is a or returns an Iterator.)

Right. So the additional methods that got added since then are:

Heuristics: getSizeIfKnown // default = -1 estimateSize // default -1 getNaturalSplits // can always return 1 (means binary split) isPredictableSplits // can always return false

In most cases I can think of, these are either things you know or you don't; there's not a lot of "tuning" work going on. (The size-related ones get harder if they have to track size during iteration, but I am perfectly content to say they only have meaning in the splitting phase.) If you take the default, you may lose something in some cases, or not (e.g., if you always return getNaturalSplits=1, but you have a non-binary tree, you may get a lopsided computation tree, but you'll still get the right answer; if you don't know the size, you may pay for an extra copy for pipelines ending in toArray.)

There's also some extra methods for element access: iterator forEach

In most of our implementations, the internal class implements both Spliterator and Iterator, so the iterator() implementation is just "return this". IMO this gets us a more cleanly factored API with little or no additional performance cost.

forEach is optional (there's an obvious default) but it was a design goal to eliminate use of Iterator in the cases where we can, so providing a better forEach is often worth it (and usually not hard.)

So yes, there are a lot of "extra" methods, but most are trivial to implement (or have reasonable defaults.)



More information about the lambda-libs-spec-observers mailing list