msg191135 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-06-14 14:59 |
Currently when a module is garbage collected its dict is purged by replacing all values except __builtins__ by None. This helps clear things at shutdown. But this can cause problems if it occurs *before* shutdown: if we use a function defined in a module which has been garbage collected, then that function must not depend on any globals, because they will have been purged. Usually this problem only occurs with programs which manipulate sys.modules. For example when setuptools and nose run tests they like to reset sys.modules each time. See for example http://bugs.python.org/issue15881 See also http://bugs.python.org/issue16718 The trivial patch attached prevents the purging behaviour for modules gc'ed before shutdown begins. Usually garbage collection will end up clearing the module's dict anyway. I checked the count of refs and blocks reported on exit when running a trivial program and a full regrtest (which will cause quite a bit of sys.modules manipulation). The difference caused by the patch is minimal. Without patch: do nothing: [20234 refs, 6582 blocks] full regrtest: [92713 refs, 32597 blocks] With patch: do nothing: [20234 refs, 6582 blocks] full regrtest: [92821 refs, 32649 blocks] |
|
|
msg191217 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-06-15 18:11 |
> Usually garbage collection will end up clearing the module's dict anyway. This is not true, since global objects might have a __del__ and then hold the whole module dict alive through a reference cycle. Happily though, PEP 442 is going to make that concern obsolete. As for the interpreter shutdown itself, I have a pending patch (post-PEP 442) to get rid of the globals cleanup as well. It may be better to merge the two approaches. |
|
|
msg191229 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-06-15 19:43 |
On 15/06/2013 7:11pm, Antoine Pitrou wrote: >> Usually garbage collection will end up clearing the module's dict anyway. > > This is not true, since global objects might have a __del__ and then hold > the whole module dict alive through a reference cycle. Happily though, > PEP 442 is going to make that concern obsolete. I did say "usually". > As for the interpreter shutdown itself, I have a pending patch (post-PEP 442) > to get rid of the globals cleanup as well. It may be better to merge the two approaches. So you would just depend on garbage collection? Do you know how many refs/blocks are left at exit if one just uses garbage collection (assuming PEP 442 is in effect)? I suppose adding GC support to those modules which currently lack it would help a lot. BTW, I had a more complicated patch which keeps track of module dicts using weakrefs and purges any which were left after garbage collection has had a chance to free stuff. But most module dicts ended up being purged anyway, so it did not seem worth the hassle when a two-line patch mostly fixes the immediate problem. |
|
|
msg191230 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-06-15 19:44 |
> > As for the interpreter shutdown itself, I have a pending patch (post-PEP 442) > > to get rid of the globals cleanup as well. It may be better to merge the two approaches. > > So you would just depend on garbage collection? No, I also clean up those modules that are left alive after a garbage collection pass. |
|
|
msg193945 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-07-30 18:26 |
Now that PEP 442 is committed, here is the patch. |
|
|
msg193957 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-07-30 21:36 |
Slightly better patch. Also, as I pointed out in python-dev (http://mail.python.org/pipermail/python-dev/2013-July/127673.html), this is still imperfect due to various ways modules can be kept alive from long-lived C variables. |
|
|
msg193976 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-07-31 06:44 |
See and . |
|
|
msg193991 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-07-31 09:35 |
Updated patch has tests and also removes several cleanup hacks. |
|
|
msg194015 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-07-31 20:17 |
Updated patch with a hack in Lib/site to unpatch builtins early at shutdown. |
|
|
msg194020 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-07-31 21:15 |
New changeset 79e2f5bbc30c by Antoine Pitrou in branch 'default': Issue #18214: Improve finalization of Python modules to avoid setting their globals to None, in most cases. http://hg.python.org/cpython/rev/79e2f5bbc30c |
|
|
msg194021 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-07-31 21:16 |
Let's wait for the buildbots on this one too. |
|
|
msg194026 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-07-31 22:39 |
I played a bit with the patch and -v -Xshowrefcount. The number of references and blocks left at exit varies (and is higher than for unpatched python). It appears that a few (1-3) module dicts are not being purged because they have been "orphaned". (i.e. the module object was garbaged collected before we check the weakref, but the module dict survived.) Presumably it is the hash randomization causing the randomness. Maybe 8 out of 50+ module dicts actually die a natural death by being garbage collected before they are purged. Try ./python -v -Xshowrefcount check_purging.py |
|
|
msg194040 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-08-01 09:49 |
> It appears that a few (1-3) module dicts are not being purged because they > have been "orphaned". (i.e. the module object was garbaged collected before > we check the weakref, but the module dict survived.) Module globals can be kept alive by any function defined in that module. So if that function is registered eternally in a C static variable, the globals dict will never get collected. > ./python -v -Xshowrefcount check_purging.py I always get either: # remaining {'encodings', '__main__'} [...] [24834 refs, 7249 blocks] or # remaining {'__main__', 'encodings'} [...] [24834 refs, 7249 blocks] ... which seems to hint that it is quite stable actually. The encodings globals are kept alive because of the codecs registration, I believe. As for the __main__ dict, perhaps we're missing a decref somewhere. > Maybe 8 out of 50+ module dicts actually die a natural death by being > garbage collected before they are purged. I get different numbers from you. If I run "./python -v -c pass", most modules in the "wiping" phase are C extension modules, which is expected. Pretty much every pure Python module ends up garbage collected before that. By the way, please also try which will bring an other improvement. |
|
|
msg194042 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-08-01 09:59 |
> As for the __main__ dict, perhaps we're missing a decref somewhere. Actually, it's not surprising. Blob's methods hold a reference to the __main__ globals, and there's still a Blob object alive in encodings. If you replace the end of your script with the following: for name, mod in sys.modules.items(): if name != 'encodings': mod.__dict__["__blob__"] = Blob(name) del name, mod, Blob then at the end of the shutdown phase, remaining is empty. |
|
|
msg194043 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-08-01 10:27 |
On 01/08/2013 10:59am, Antoine Pitrou wrote: > If you replace the end of your script with the following: > > for name, mod in sys.modules.items(): > if name != 'encodings': > mod.__dict__["__blob__"] = Blob(name) > del name, mod, Blob > > > then at the end of the shutdown phase, remaining is empty. On Windows, even with this change, I get for example: # remaining {'encodings.mbcs', '__main__', 'encodings.cp1252'} ... [22081 refs, 6742 blocks] or # remaining {'__main__', 'encodings'} ... [23538 refs, 7136 blocks] |
|
|
msg194044 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-08-01 10:40 |
You might want to open a prompt and look at gc.get_referrers() for encodings.mbcs.__dict__ (or another of those modules). |
|
|
msg194045 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-08-01 11:08 |
> You might want to open a prompt and look at gc.get_referrers() for > encodings.mbcs.__dict__ (or another of those modules). >>> gc.get_referrers(sys.modules['encodings.mbcs'].__dict__) [<module 'encodings.mbcs' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\mbcs.py'>, <function decode at 0x01DEEF38>, <function getregentry at 0x01DFA038>, <function IncrementalEncoder.encode at 0x01DFA098>] >>> gc.get_referrers(sys.modules['encodings.cp1252'].__dict__) [<module 'encodings.cp1252' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\cp1252.py'>, <function getregentry at 0x02802578>, <function Codec.encode at 0x02802518>, <function Codec.decode at 0x028025D8>, <function IncrementalEncoder.encode at 0x02802638>, <function IncrementalDecoder.decode at 0x02802698>] >>> gc.get_referrers(sys.modules['__main__'].__dict__) [<function Blob.__init__ at 0x0057ABD8>, <function Blob.__del__ at 0x02AD36F8>, <frame object at 0x027DFA80>, <function at 0x02AD3DB8>, <frame object at 0x02A38038>, <module '__main__' (<_frozen_importlib.SourceFileLoader object at 0x0271EAB8>)>] |
|
|
msg194047 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-08-01 11:17 |
> I get different numbers from you. If I run "./python -v -c pass", most > modules in the "wiping" phase are C extension modules, which is expected. > Pretty much every pure Python module ends up garbage collected before > that. The *module* gets gc'ed, sure. But you can't tell from "./python -v -c pass" when the *module dict* get gc'ed. Using "./python -v check_purging.py", before the purging stage (# cleanup [3]) I only get # purge/gc operator 54 # purge/gc io 53 # purge/gc keyword 52 # purge/gc types 51 # purge/gc sysconfig 50 That leaves lots of pure python module dicts to be purged later on. |
|
|
msg194055 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-08-01 12:23 |
Here (Linux) I get the following: # purge/gc os.path 12 # purge/gc operator 50 # purge/gc io 49 # purge/gc _sysconfigdata 48 # purge/gc sysconfig 47 # purge/gc keyword 46 # purge/gc site 45 # purge/gc types 44 Also, do note that purge/gc after wiping can still be a regular gc pass unless the module has been wiped. The gc could be triggered by another module being wiped. |
|
|
msg194066 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-08-01 13:03 |
> Also, do note that purge/gc after wiping can still be a regular > gc pass unless the module has been wiped. The gc could be triggered > by another module being wiped. For me, the modules which die naturally after purging begins are # purge/gc encodings.aliases 34 # purge/gc _io 14 # purge/gc collections.abc 13 # purge/gc sre_compile 12 # purge/gc heapq 11 # purge/gc sre_constants 10 # purge/gc _weakrefset 9 # purge/gc reprlib 8 # purge/gc weakref 7 # purge/gc site 6 # purge/gc abc 5 # purge/gc encodings.latin_1 4 # purge/gc encodings.utf_8 3 # purge/gc genericpath 2 Of these, all but the first appear to happen during the final cyclic garbage collection. |
|
|
msg194069 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-08-01 13:12 |
> Also, do note that purge/gc after wiping can still be a regular gc > pass unless the module has been wiped. The gc could be triggered by > another module being wiped. That said, I welcome any suggestions to improve things. The ultimate reasons we need to purge some modules are the same two reasons I indicated on python-dev: C extension modules are almost immortal; and some C code keeps references alive too long. Do you agree that this patch is ok and we should address those two problems in separate new issues? |
|
|
msg194076 - (view) |
Author: Richard Oudkerk (sbt) *  |
Date: 2013-08-01 14:14 |
Yes, I agree the patch is ok. It would be would be much simpler to keep track of the module dicts if they were weakrefable. Alternatively, at shutdown a weakrefable object with a reference to the module dict could be inserted in to each module dict. We could then use those to find orphaned module dicts. But I doubt it is worth the extra effort. |
|
|
msg194079 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-08-01 15:14 |
Ok, let's attack the rest separately then :) And thanks a lot for testing! |
|
|
msg194111 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-08-01 20:12 |
By the way, you may be interested to learn that the patch in has made things quite a bit better now: C extension modules can be collected much earlier. |
|
|