msg299393 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-07-28 15:03 |
See bpo-bpo-30891 and bpo-30876 for recent changes in importlib. http://buildbot.python.org/all/builders/AMD64%20Windows8.1%20Refleaks%203.x/builds/57/steps/test/logs/stdio 1:33:30 [312/406/2] test_threaded_import passed (30 sec) -- running: test_decimal (826 sec), test_set (84 sec), test_mmap (1151 sec) beginning 6 repetitions 123456 ...Exception ignored in: <function _get_module_lock..cb at 0x000000813CA18EB8> Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 176, in cb KeyError: ('random',) ... |
|
|
msg299951 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-08-08 22:59 |
Oh, it's easy to reproduce the bug on the master branch. Example on Linux: haypo@selma$ ./python -m test test_threaded_import Run tests sequentially 0:00:00 load avg: 0.16 [1/1] test_threaded_import Exception ignored in: <function _get_module_lock..cb at 0x7f53201e2f70> Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 176, in cb KeyError: ('random',) 1 test OK. Total duration: 2 sec Tests result: SUCCESS |
|
|
msg299952 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-08-08 23:40 |
The problem is the weakref callback of _module_locks. It seems like the callback is called after a second thread replaced _module_locks[name] * Thread 1: _get_module_lock('random') sets _module_locks['random'] and registers a weakref callback on the newly created lock * Thread 1: the lock is destroyed, the weakref callback will be called as soon as possible * Thread 2: _get_module_lock('random') sees that the weakref is dead and so replaces _module_locks['random'] with a new lock, and registers a new weakref callback on the second lock * Thread 1: Call the weakref callback removing _module_locks['random'] -- BUG: the callback is called too late, after Thread 2 already sets a new lock * Thread 2: The lock is destroyed, the second weakref callback will be called as soon as possible * Thread 2: The second callback is called and fails with KeyError since _module_locks['random'] was already called in the Thread 1 in the meanwhile |
|
|
msg299953 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-08-08 23:43 |
I wrote https://github.com/python/cpython/pull/3029 which seems to fix the issue. I don't know importlib well enough to understand why we need a weak reference to a lock. My PR adds a second lock per module lock (!) to be able to wait until _module_locks[name] is deleted when a thread detects that _module_locks[name] exists and the associated lock was destroyed. |
|
|
msg299971 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-09 06:45 |
I think the solution can be simpler. PR 3033 uses the global import lock for guarding modification of the _module_locks dictionary. |
|
|
msg299980 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-08-09 09:23 |
Serhiy: "I think the solution can be simpler. PR 3033 uses the global import lock for guarding modification of the _module_locks dictionary." I tried exactly that, but it wasn't enough. But your PR is correct because it also checks the current value of _module_locks[name] using the callback parameter. I disliked my own PR, so I'm happy that someone else wrote a simpler fix! (I already abandonned my PR.) |
|
|
msg299989 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-09 11:29 |
New changeset 9b0d1d647e3d2ec9d299e5c9f49b02fbbb810a5a by Serhiy Storchaka in branch 'master': bpo-31070: Fix a race condition in importlib _get_module_lock(). (#3033) https://github.com/python/cpython/commit/9b0d1d647e3d2ec9d299e5c9f49b02fbbb810a5a |
|
|
msg299992 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-09 11:56 |
New changeset f3b891718e104b6e7018b58bbcd86421a2837fb8 by Serhiy Storchaka in branch '3.6': [3.6] bpo-31070: Fix a race condition in importlib _get_module_lock(). (GH-3033). (#3038) https://github.com/python/cpython/commit/f3b891718e104b6e7018b58bbcd86421a2837fb8 |
|
|
msg299993 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-09 11:57 |
Thank you for review Victor and Nick. |
|
|
msg299999 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-08-09 13:12 |
I'm curious to know if this issue is a regression caused by recent changes: bpo-30891 and bpo-30876. But well, it doesn't matter as soon as it's fixed :-) |
|
|
msg300007 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2017-08-09 13:46 |
I don't think the changes created the bug, but I do think the new tests to provoke the other race conditions made it easier for the test suite to hit this race as well. |
|
|
msg300009 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-08-09 14:01 |
> I don't think the changes created the bug, but I do think the new tests to provoke the other race conditions made it easier for the test suite to hit this race as well. Oh ok, that's a good news! |
|
|
msg305147 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2017-10-28 04:46 |
A recent numpy threaded import bug report that appears in 3.6.1 but is absent in 3.6.3: https://github.com/numpy/numpy/issues/9931 So that's additional evidence that this really was a pre-existing race condition that the new tests for the SystemError cases revealed. |
|
|