Fwd: Re: Stream.limit() - puzzler/bug/feature (original) (raw)

Brian Goetz brian.goetz at oracle.com
Thu Nov 15 11:45:14 PST 2012


So, there's a couple of questions here.

Should this be legal? It seems harmless in this example, and the serial implementation is trivial, but making guarantees about what state a stream is left in after a terminal operation is a rats nest for very little value. (If you want to get at the elements in a way that the built-in ops can't support, the escape hatch is "iterator" or "spliterator".)

Preference: calling a terminal op on a stream, or asking for an iterator/spliterator, "seals" the stream and later use of that stream is an error. Not clear we can enforce this at acceptable cost, so the fallback is "results are undefined and implementations are encourage to free any resources held."

We don't want to go out of our way to support this as it constrains our ability to optimize using lookahead and such in the common case. Again, ideally the above should probably be an error, but not clear if we can enforce that perfectly/economically.

On 11/15/2012 2:16 PM, Remi Forax wrote:

This sprang up on the lambda-dev list and Brian ask me to transfer it to the EG list, given that the last messages only implied people of this EG, I have copy/pasted these messages.

The problem is how to deal with stream that are created from IO objects ? Should the implementation throws a runtime exception if such a stream are iterated twice by a forEach by example ? Rémi

Brian wrote: > A related question is what should happen in this case: > > T first = stream.findFirst(); > T second = stream.findFirst(); > >This "accidentally" works in the current serial impl, but is in general a nightmare. Terminal ops should probably "close" the stream. Sam wrote: This was my thinking when I read the example. Not sure if that is practical but it might reduce errors such as the one described.

Sam On Nov 15, 2012, at 9:36 AM, Remi Forax <forax at univ-mlv.fr> wrote: On 11/15/2012 06:22 PM, Brian Goetz wrote: The best way to think about it is that a Stream is more like an Iterator than a data structure. There is some abstract source of data somewhere (it might be in a data structure, or might be generated from a function, or read from a network), and a series of transformations applied to the data between the source and the consumer. Streams can additionally execute using parallelism, if requested.

Stream constructs like: Stream s = people.stream() .filter(p -> p.getLastName().equals("Smith"))) do not do any filtering on construction. It simply says "there's a stream source, the collection 'people', and when you consume from the stream s, you'll get the results of filtering the source values." The confusion in Dmitry's example is akin to multiple activities reading from the same IO channel -- they might interfere with each other over who gets the next value, and any buffering that any consumer does may confuse other consumers. Maybe the implementation should protect users to use two aliases of a non-replayable stream. Using the example of Dmitry, if the stream is an IO channel, the second call to limit() or to any method of 's' should throw an IllegalStateException. Rémi



More information about the lambda-libs-spec-observers mailing list