Issue 1444398: Make itertools.tee participate in GC (original) (raw)

Created on 2006-03-06 20:33 by twouters, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
itertools.diff twouters,2006-03-25 16:39 fixed version
Messages (10)
msg49671 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-06 20:33
A small patch to make itertools.tee objects participate in GC; solves the memoryleak in test_generators (and any other use of tee objects in cycles.)
msg49672 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-07 00:05
Logged In: YES user_id=80475 I will take a look at the patch soon. IIRC, there was a concious decision to not have tee engage in GC because no normal use cases created cycles (they seem to arise only when intentionally creating a cycle for test code).
msg49673 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-07 00:12
Logged In: YES user_id=34209 Well, I'm fine with removing the scope-enclosed generator versions of _m235() and fib() from test_generators, if that's what you're implying. :> However, I think it's slightly more suitable to just have tee participate in GC.
msg49674 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-03-07 00:52
Logged In: YES user_id=31435 If there are likely to be millions of some type of object, then avoiding gc for that type on grounds of reducing memory use may be arguable (or, as in the cases of tuples or lists, may not be arguable regardless). Else "if it points to a PyObject, it might be in a cycle, so gc-enable it" generally rules. I'll note in passing that the m235 and fib generators weren't _intended_ to stress Python in any way, but have been extremely effective at doing so since generators first went in. Think of them as a pair of canaries in the coal mine :-)
msg49675 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-19 08:02
Logged In: YES user_id=80475 Okay, go ahead and add GC to itertools.tee(). The test_generators examples make it clear that there are valid use cases for feeding a teeobject back into itself. Tim, the original rationale did not have to do with millions of tee objects; rather, it posited that a tee could contain many objects of some other type and that traversing it during GC was simply a waste of time. Thomas, the patch needs work, but I don't currently have time to go through it. Do your best to model after the other tools which have gc. That includes setting the tp_flags slot, doing an untrack before dealloc starts, replacing PyObject_Del and PyObject_New with their GC counterparts. See section 2.1.3 in Extending and Embedding for the details and examples. Offhand, I think it may also need a tp_clear entry but I'm not sure. If you can work out a good patch, go ahead an apply it for the alpha 1 release. If not, assign back to me and I'll get to it when I can (possibly for the second alpha).
msg49676 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-20 21:34
Logged In: YES user_id=34209 Hmm. An alternative fix would be to make itertools.tee only partially participate in GC. That is, not have it traverse over all the stored items, but just the iterator. It would prevent the (potentially rather heavy) cost of going over all cached items, but still leaves a potential for unfindable cycles.
msg49677 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-25 16:17
Logged In: YES user_id=80475 Georg, do you have time to fix-up this patch and check it in?
msg49678 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-25 16:39
Logged In: YES user_id=34209 I cleaned up my patch (attached), but for some reason, it isn't resolving the refleaks in test_generators. I swear my patch was resolving them before, but I guess I must've done something else (maybe in addition to this) that fixed the leaks. Or maybe I'm just going cross-eyed. I do believe I implemented GC correctly, and I do believe the leaks are uncleanable circular references. I haven't had time to debug it more; if someone else wants to, feel free.
msg49679 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-03-25 17:14
Logged In: YES user_id=80475 In teedataobject_new(), try adding PyObject_GC_Track(tdo) just before the return statement.
msg49680 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2006-03-27 20:59
Logged In: YES user_id=34209 Hmm. Well, adding PyObject_GC_Track() makes GC work right. I don't know why, since the rest of itertools doesn't use PyObject_GC_Track, but it now fixes refleaks that involve just itertools, like: def leak(): class gen: def __iter__(self): return self def next(self): return self.item g = gen() head, tail = tee(g) g.item = head This example leaked without tee doing GC (and without PyObject_GC_Track), but doesn't leak with proper GC. However! test_generators still leaks, and so does this example (which doesn't use itertools): def leak(): def gen(): while True: yield g g = gen() g.next() I suspect this leak was introduced after I submitted this patch, which explains why this patch suddenly stopped working ;) Since adding GC still fixes reference-cycles, I will check in this patch while I hunt for the other leak.
History
Date User Action Args
2022-04-11 14:56:15 admin set github: 42988
2006-03-06 20:33:38 twouters create