[Python-Dev] Garbage collecting closures (original) (raw)
Guido van Rossum guido@python.org
Tue, 15 Apr 2003 15:23:48 -0400
- Previous message: [Python-Dev] Garbage collecting closures
- Next message: [Python-Dev] Garbage collecting closures
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Guido] > ... > This makes me think that Python should run the garbage collector > before exiting, so that finalizers on objects that were previously > kept alive by cycles are called (even if finalizers on objects that > are part of a cycle still won't be called).
[Tim]
What about finalizers on objects that are alive at exit because they're still reachable? We seem to leave a lot of stuff alive at the end. For example, here are the pymalloc stats at the end under current CVS, after opening an interactive shell then exiting immediately; this is produced at the end of PyFinalize(), and only callllexitfuncs() is done after this (and that probably shouldn't free anything):
Small block threshold = 256, in 32 size classes. class size num pools blocks in use avail blocks ----- ---- --------- ------------- ------------ 2 24 1 1 168 5 48 1 2 82 6 56 13 170 766 7 64 13 445 374 8 72 5 25 255 9 80 1 1 49 15 128 1 2 29 20 168 5 25 95 23 192 1 1 20 25 208 1 2 17 29 240 1 2 14 31 256 1 1 14 # times object malloc called = 17,119 3 arenas * 262144 bytes/arena = 786,432 # bytes in allocated blocks = 45,800 # bytes in available blocks = 131,072 145 unused pools * 4096 bytes = 593,920 # bytes lost to pool headers = 1,408 # bytes lost to quantization = 1,944 # bytes lost to arena alignment = 12,288 Total = 786,432 "size" here is 16 bytes larger than in a release build, because of the 8-byte padding added by PYMALLOCDEBUG on each end of each block requested. So, e.g., there's one (true size) 8-byte object still living at the end, and 445 48-byte objects. Unreclaimed ints and floats aren't counted here (they've got their own free lists, and don't go thru pymalloc). I don't know what all that stuff is, but I bet there are about 25 dicts still alive at the end.
Close! I moved the debugging code that can print the list of all objects still alive at the end around so that it is now next to the code that prints the above malloc stats. (If you're following CVS email you might have noticed this. :-) The full output is way too large to post; you can see for yourself by creating a debug build and running this (on Unix; windows users use their imagination or upgrade their OS):
PYTHONDUMPREFS= ./python -S -c pass
When I run this, I see 23 dictionaries. One is the dict of interned strings that are still alive; the others are the tp_dicts of the various built-in type objects. Some interned strings appear to be kept alive by various static globals holding names for faster name lookup; there isn't much we can do about that. I also don't think we should bother un-initializing the built-in types. Apart from that, I don't think I see anything that looks suspect. Of course, running a larger program with the same setup might reveal real leaks.
> I also think that if a strongly connected component (a stronger > concept than cycle) has exactly one object with a finalizer in it, > that finalizer should be called, and then the object should > somehow be marked as having been finalized (maybe a separate GC > queue could be used for this) in case it is resurrected by its > finalizer.
With the addition of gc.getreferents() in 2.3, it's easy to compute SCCs via Python code now; it's a PITA in C. OTOH, figuring out which finalizers to call seems a PITA in Python: A<->F1 -> F2<->B F1 and F2 have finalizers; A and B don't. Python code can easily determine that there are 2 SCCs here, each with 1 finalizer (I suppose gc's hasfinalizer() would need to be exposed, to determine whether del exists correctly). A tricky bit then is that running F1.del may end up deleting F2 by magic (this is possible since F2 is reachable from F1, and F1.del may break the link to F2), but it's hard for pure-Python code to know that. So that part seems easier done in C, and creating new gc lists in C is very easy thanks to the nice doubly-linked-list C API Neil coded in gcmodule. Note a subtlety: the finalizers in SCCs should be run in a topsort ordering of the derived SCC graph (since F1.del can ask F2 to do stuff, despite that F1 and F2 are in different SCCs, F1 should be finalized before F2). Finding a topsort order is also easy in Python (and also a PITA in C). So I picture computing a topsorted list of suitable objects (those that have a finalizer, and have the only finalizer in their SCC) in Python, and passing that on to a new gcmodule entry point. The latter can link those objects into a doubly-linked C list in the same order, and then run finalizers "left to right". It's a nice property of the gc lists that, e.g., if F1.del does end up deleting F2, F2 simply vanishes from the list. Another subtlety: suppose F1.del resurrects F1, and doesn't delete F2. Should F2.del be called anyway? Probably not, since if F1 is alive, everything reachable from it is also alive, and F1 -> F2. I've read that Java can get into a state where it's only able to reclaim 1 object per full gc collection due to headaches like this, despite that everything is trash. There's really no way to tell whether F1.del resurrects F1 short of starting gc over again (in particular, looking at F1's refcount before and after running F1.del isn't reliable evidence for either conclusion, unless the "after" refcount is 0).
I'm glazing over the details now, but there seems to be a kernel of useful cleanup in here somehow; I hope that someone will be able to contribute a prototype of such code at least!
--Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Garbage collecting closures
- Next message: [Python-Dev] Garbage collecting closures
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]