Streams and Spliterator characteristics confusion (original) (raw)
Kasper Nielsen kasperni at gmail.com
Sat Jun 28 15:40:13 UTC 2014
- Previous message: Streams and Spliterator characteristics confusion
- Next message: Streams and Spliterator characteristics confusion
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks,
followup questions inlined.
On Fri, Jun 27, 2014 at 11:43 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
Internally in the stream pipeline we keep track of certain characteristics for optimization purposes and those are conveniently used to determine the characteristics of the Spliterator, so there are some idiosyncrasies poking through.
> s.sorted().spliterator() -> Spliterator.SORTED = true > But if I use specify a comparator the stream is not sorted > s.sorted((a,b) -> 1).spliterator() -> Spliterator.SORTED = false > Right, there is an optimization internally that ensures if the upstream stream is already sorted than the sort operation becomes a nop e.g. s.sorted().sorted(); This optimization cannot apply when a comparator is passed in since we don't know if two comparators are identical in their behaviour e.g: s.sorted((a, b) -> a.compareTo(b)).sorted(Compatators.naturalOrder()).sorted() What initially made me wonder was the javadoc of Spliterator#getComparator() which list "If this Spliterator's source is SORTED by a Comparator returns that Comparator."
So I assumed s.sorted((a,b) -> 1).spliterator().getComparator() would return said comparator.
It just feels a bit inconsistent compared to, for example, new TreeMap(comparator).keyset().spliterator() which returns Spliterator.SORTED = true and a comparator.
> s.distinct().spliterator() -> Spliterator.DISTINCT = true > but limiting the number of distinct elements makes the stream non distinct > s.distinct().limit(10).spliterator() -> Spliterator.DISTINCT = false I don't observe that (see program below).
Right, that was an error on my part.
But still, I think some there are some cases where the flag should be maintained. For example, I think following the following program should print 4 'true' values but it only prints 1. Especially the second one puzzles me, invoking distinct() makes it non-distinct?
static IntStream s() { return StreamSupport.intStream(Spliterators.spliterator(new int[] { 12, 34 }, Spliterator.DISTINCT), false); }
public static void main(String[] args) {
System.out.println(s().spliterator().hasCharacteristics(Spliterator.DISTINCT));
System.out.println(s().distinct().spliterator().hasCharacteristics(Spliterator.DISTINCT));
System.out.println(s().boxed().spliterator().hasCharacteristics(Spliterator.DISTINCT));
System.out.println(s().asDoubleStream().spliterator().hasCharacteristics(Spliterator.DISTINCT)); }
> On the other hand something like Spliterator.SORTED is maintained when I > invoke limit > s.sorted().limit(10).spliterator() -> Spliterator.SORTED = true > > > A flag such as Spliterator.NONNULL is also cleared in situations where it > should not be. That is because it is not tracked in the pipeline as there is no gain optimisation-wise (if it was it would be cleared for map/flatMap operations and preserved for other ops like filter as you say below). It's not that difficult to support this and should add no measurable performance cost, we deliberately left space in the bit fields, however since spliterator() is an escape-hatch for doing stuff that cannot be done by other operations i think the value of supporting NONULL is marginal. I am trying to implement the stream interfaces and I want to make sure that my implementation have similar behaviour as the default implementation in java.util.stream. The interoperability between streams and Spliterator.characteristics is the only thing I'm having serious issues with. I feel the current state is more a result of how streams are implemented at the moment then as part of a public API.
I think something like a table with non-terminal stream operations as rows and characteristics as columns. Where each cell was either: "cleared", "set" or "maintained" would make sense.
Cheers Kasper
- Previous message: Streams and Spliterator characteristics confusion
- Next message: Streams and Spliterator characteristics confusion
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]