Fwd: Re: Stream.limit() - puzzler/bug/feature (original) (raw)

Remi Forax forax at univ-mlv.fr
Fri Nov 16 04:50:15 PST 2012


On 11/15/2012 08:45 PM, Brian Goetz wrote:

So, there's a couple of questions here.

- Linear stream chain, but multiple use. For example: Stream s = ... T first = s.findFirst(); T second = s.findFirst(); Should this be legal? It seems harmless in this example, and the serial implementation is trivial, but making guarantees about what state a stream is left in after a terminal operation is a rats nest for very little value. (If you want to get at the elements in a way that the built-in ops can't support, the escape hatch is "iterator" or "spliterator".) Preference: calling a terminal op on a stream, or asking for an iterator/spliterator, "seals" the stream and later use of that stream is an error. Not clear we can enforce this at acceptable cost, so the fallback is "results are undefined and implementations are encourage to free any resources held."

I fully agree and it should be enforced by throwing a runtime exception otherwise we will have a stream :) of bug reports.

Also there is another good reason to don't allow that. Otherwise, the interface Stream can be used instead of Collection so it will add burden to all API designers (should I take a Collection, an Iterator or a Stream ?).

A Stream is a weak version of an Iterator, when you start to pull value from it, it invalidates the whole stream chain. BTW, I'm still not sure we should not provide a method iterator() to be able to 'upgrade' to the iterator semantics (for the parallel implementation iterator() will be equivalent to sequential().iterator()).

- Nonlinear stream graph. For example: Stream s = ... Stream a = s.filter(...); Stream b = s.map(...); // use a and b We don't want to go out of our way to support this as it constrains our ability to optimize using lookahead and such in the common case. Again, ideally the above should probably be an error, but not clear if we can enforce that perfectly/economically.

It should be enforced calling a terminal method should seal the whole stream chain and throw an exception if a stream of the chain is already sealed. From the implementation perspective, when you have for an operation to a stream, you should seal it, so stream are mutable but ops are not. Its seems doable (from my planet :)

cheers, Rémi

On 11/15/2012 2:16 PM, Remi Forax wrote: This sprang up on the lambda-dev list and Brian ask me to transfer it to the EG list, given that the last messages only implied people of this EG, I have copy/pasted these messages.

The problem is how to deal with stream that are created from IO objects ? Should the implementation throws a runtime exception if such a stream are iterated twice by a forEach by example ? Rémi

Brian wrote: > A related question is what should happen in this case: > > T first = stream.findFirst(); > T second = stream.findFirst(); > >This "accidentally" works in the current serial impl, but is in general a nightmare. Terminal ops should probably "close" the stream. Sam wrote: This was my thinking when I read the example. Not sure if that is practical but it might reduce errors such as the one described. Sam On Nov 15, 2012, at 9:36 AM, Remi Forax <forax at univ-mlv.fr> wrote: On 11/15/2012 06:22 PM, Brian Goetz wrote: The best way to think about it is that a Stream is more like an Iterator than a data structure. There is some abstract source of data somewhere (it might be in a data structure, or might be generated from a function, or read from a network), and a series of transformations applied to the data between the source and the consumer. Streams can additionally execute using parallelism, if requested.

Stream constructs like: Stream s = people.stream() .filter(p -> p.getLastName().equals("Smith"))) do not do any filtering on construction. It simply says "there's a stream source, the collection 'people', and when you consume from the stream s, you'll get the results of filtering the source values." The confusion in Dmitry's example is akin to multiple activities reading from the same IO channel -- they might interfere with each other over who gets the next value, and any buffering that any consumer does may confuse other consumers. Maybe the implementation should protect users to use two aliases of a non-replayable stream. Using the example of Dmitry, if the stream is an IO channel, the second call to limit() or to any method of 's' should throw an IllegalStateException. Rémi



More information about the lambda-libs-spec-observers mailing list