Issue 774414: Add a stream type and a merge type to itertoolsmodule.c (original) (raw)
This patch adds a stream type to itertoolsmodule.c, which provides a way to cache results from a generator. This is useful if you want to iterate over the generator more than once. It also lets potentially infinite generators simulate lists/tuples; stream(some_generator())[10] produces the first 11 values from the generator and returns the last one, but doesn't produce any more.
The other type added is useable to merge the output of two (sorted) iterables into one iterable.
I assume documentation would also need to be updated if this patch gets accepted, but since I imagine that won't be an open-and-shut case I haven't written any yet.
Logged In: YES user_id=80475
Originally, I looked at implementing all of itertools as a single object supporting various methods but rejected it after working through the use cases. It may be time to take another look.
At first glance, this object does not fit well with other itertools:
- it returns an object supporting more than iter and next
- it consumes memory (other itertools except for cycle do not require auxiliary storage)
- it does not support a functional style (to take advantage of the cache, a cache object needs to be created and further accesses go from there). The one example you supplied is better accomplished with islice() -- see the documentation example for nth().
- it doesn't play nice with other itertools which would need to be modified to take advantage of the cache: s=stream (some_gen()).
I can see some need for caching behavior but would like to see compelling use cases that cannot easily be met with list () and islice(). Create a few examples like the ones in the itertools documentation. These will demonstrate the use cases and show that the new function can play nicely with the other building blocks. Try implementing window() with the stream tool.
Also, I'm concerned about the len() method on a potentially infinite generator.
See if you can find a better name for it than stream().
Ideally, the name should suggest caching, sequence-like
behavior, and lazy evaluation.
Try coding a pure python version and submitting it to the newgroup to build support for the idea, see if other's can refine the idea, and to tease out use cases.
The merge() function is not sufficiently general purpose to warrant inclusion in itertools. Also, it would be best to allow custom comparison so as not to lock in ascending order behavior.
Your assumption on documentation is correct. Also you would need unittests, examples, and a pure-python version.
Overall, the patch looks nicely done.
Logged In: YES user_id=80475
The use cases for iterating more than once are better served by iterator splitting. See http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/2 13027 .
For the given example, stream(some_generator())[10], the need is already met by islice(a, 10, 11).next().
There may yet be an opportunity to develop a lazylist type that has an underlying iterator. It would be belong outside the scope of itertools and would need to have demonstrated its usefulness by being released into the wild for while.