msg290106 - (view) |
Author: Michael Seifert (MSeifert) * |
Date: 2017-03-24 19:14 |
When using `copy.copy` to copy an `itertools.chain` instance the results can be weird. For example >>> from itertools import chain >>> from copy import copy >>> a = chain([1,2,3], [4,5,6]) >>> b = copy(a) >>> next(a) # looks okay 1 >>> next(b) # jumps to the second iterable, not okay? 4 >>> tuple(a) (2, 3) >>> tuple(b) (5, 6) I don't really want to "copy.copy" such an iterator (I would either use `a, b = itertools.tee(a, 2)` or `b = a` depending on the use-case). This just came up because I investigated how pythons iterators behave when copied, deepcopied or pickled because I want to make the iterators in my extension module behave similarly. |
|
|
msg290143 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2017-03-24 21:19 |
Humph, that is definitely not the expected result. The itertools copy/reduce support has been a never-ending source of bugs and headaches. It looks like the problem is that __reduce__ is returning the existing tuple iterator rather than a new one: >>> a = chain([1,2,3], [4,5,6]) >>> b = copy(a) >>> next(a) 1 >>> a.__reduce__() (<class 'itertools.chain'>, (), (<tuple_iterator object at 0x104ee78d0>, <list_iterator object at 0x104f81b70>)) >>> b.__reduce__() (<class 'itertools.chain'>, (), (<tuple_iterator object at 0x104ee78d0>,)) |
|
|
msg290146 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-24 21:30 |
chain(x) is a shortcut for chain.from_iterable(iter(x)). Neither copy.copy() nor __reduce__ don't have particular relation to this. Consider following example: >>> from itertools import chain >>> i = iter([[1, 2, 3], [4, 5, 6]]) >>> a = chain.from_iterable(i) >>> b = chain.from_iterable(i) >>> next(a) 1 >>> next(b) 4 >>> tuple(a) (2, 3) >>> tuple(b) (5, 6) |
|
|
msg290916 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-31 15:43 |
This issue is related to the behavior of other composite iterators. >>> from copy import copy >>> it = map(ord, 'abc') >>> list(copy(it)) [97, 98, 99] >>> list(copy(it)) [] >>> it = filter(None, 'abc') >>> list(copy(it)) ['a', 'b', 'c'] >>> list(copy(it)) [] The copy is too shallow. If you consume an item from one copy, it is disappeared for the original. Compare with the behavior of iterators of builtin sequences: >>> it = iter('abc') >>> list(copy(it)) ['a', 'b', 'c'] >>> list(copy(it)) ['a', 'b', 'c'] >>> it = iter(list('abc')) >>> list(copy(it)) ['a', 'b', 'c'] >>> list(copy(it)) ['a', 'b', 'c'] |
|
|
msg290917 - (view) |
Author: Michael Seifert (MSeifert) * |
Date: 2017-03-31 15:59 |
Just an update what doesn't work: just overriding the `__copy__` method. I tried it but it somewhat breaks `itertools.tee` because if the passed iterable has a `__copy__` method `tee` rather copies the iterator (=> resulting in a lot of unnecessary memory overhead or breakage if a generator is "inside") instead of using it's memory-efficient internals. |
|
|
msg290918 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-31 16:03 |
Just for example there is a patch that implements in Python deeper copying for itertools.chain objects. I doesn't mean pushing it, it is too complicated. I have wrote also slightly simpler implementation, but it doesn't work due to the behavior of copied map object. |
|
|
msg290995 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2017-04-01 16:12 |
Serhiy, feel free to take this in whatever direction you think is best. |
|
|
msg291027 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2017-04-02 09:12 |
It is a tricky issue. How deep do you go?what if you are chaining several of the itertools? Seems like we're entering a semantic sinkhole here. Deepcopy would be too deep... The original copy support in these objects stems from the desire to support pickling. On 1 Apr 2017 16:12, "Raymond Hettinger" <report@bugs.python.org> wrote: > > Raymond Hettinger added the comment: > > Serhiy, feel free to take this in whatever direction you think is best. > > ---------- > assignee: -> serhiy.storchaka > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue29897> > _______________________________________ > |
|
|
msg291035 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-04-02 12:40 |
Yes, this issue is tricky, and I don't have . If implement __copy__ for builtin compound iterators I would implement filter.__copy__ and map.__copy__ something like: def __copy__(self): cls, *args = self.__reduce__() return cls(*map(copy, args)) If the underlying iterators properly support copying, the copying of filter and map iterators will be successful. If they don't support copying, the copying of filter and map iterators should fail, and don't accumulate elements in the tee() object. But there are open questions. 1. This is a behavior change. What if any code depends on the current behavior? This is silly, copy(filter) and copy(map) could just return the original iterator if this is a desirable behavior. 2. Depending on the copy module in the method of the builtin type looks doubtful. Should we implement copy.copy() in C and provide a public C API? 3. If make a copying of limited depth, shouldn't we use a memo as for deepcopy() to prevent unwanted duplications? Otherwise the copied `map(func, it, it)` would behave differently from the original. This example is not so silly as looked. 4. Is it possible to implement the copying for all compound iterators? For example the copying of chain() should change the state of the original object (by using __setstate__), so that it makes copies of subiterators before using them. Perhaps all this deserves a PEP. |
|
|
msg291091 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2017-04-03 18:56 |
> Perhaps all this deserves a PEP. If Serhiy and Kristján are on a course of action, that will suffice. Copying iterators is an esoteric endeavor of interest to very few users (no one has even noticed until now). |
|
|