msg310124 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2018-01-17 01:04 |
As a follow up to https://bugs.python.org/issue14976 which just introduced queue.SimpleQueue: concurrent.futures.thread currently uses a queue.Queue() from weakref callbacks which could in theory lead to a deadlock when periodic gc triggers a cleanup invalidating some weakrefed objects at an inopportune time while Python code has the queue's lock held. I don't have a test case proving this (deadlocks are hard). Switching it to use a SimpleQueue should avoid the problem? ... This and the C extension module based SimpleQueue would be good to backport to https://pypi.python.org/pypi/futures as well. |
|
|
msg310156 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-01-17 11:55 |
We could also switch multiprocessing.Pool. Unfortunately some code in multiprocessing.Pool relies on internal details of queue.Queue (!). |
|
|
msg310203 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2018-01-17 18:23 |
See also https://bugs.python.org/issue21009 |
|
|
msg310210 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-01-17 22:10 |
Thanks for the heads up Mark. Unfortunately the reproducer script https://bugs.python.org/issue21009 needs to hack into Queue.get(), which isn't possible for the C-implemented SimpleQueue. |
|
|
msg310228 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-01-18 09:38 |
New changeset ab74504346a6e2569b3255b7b621c589716888c4 by Antoine Pitrou in branch 'master': bpo-32576: use queue.SimpleQueue in critical places (#5216) https://github.com/python/cpython/commit/ab74504346a6e2569b3255b7b621c589716888c4 |
|
|
msg313476 - (view) |
Author: Michał Górny (mgorny) * |
Date: 2018-03-09 08:25 |
Could you get this fixed in earlier versions of CPython? Given that 3.7 is not yet released, having this broken in 3.5 and 3.6 is highly undesirable. This apparently seems to affect our tooling [1] and telling our users to use 3.7 beta is not an option. [1]:https://bugs.gentoo.org/647964 |
|
|
msg313479 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-03-09 09:52 |
Michał, sorry, I doubt it. The fix is highly non-trivial as it first requires backporting a new feature (see ). I'm cc'ing the 3.6 branch release manager just in case. If apparently you're witnessing this in controlled situations (the "gemato" utility, IIUC?), one workaround would be to disable the cyclic GC (call gc.disable()). If your program doesn't create any cyclic references, or if those references don't keep too much memory alive, that would probably work. |
|
|
msg313490 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2018-03-09 17:39 |
Yes, backporting all of the required code to earlier releases would be out of scope for a maintenance release, particularly at this late stage in the 3.6 life cycle. Let's see whether disabling the GC is a sufficient workaround until 3.7 is available. |
|
|
msg313592 - (view) |
Author: Michał Górny (mgorny) * |
Date: 2018-03-11 10:14 |
Well, according to the reporters disabling GC doesn't help at all. Maybe it's another issue. |
|
|
msg313595 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-03-11 10:51 |
Perhaps the gemato issue has nothing to do with multiprocessing indeed. I would suggest add some progress logging to your program to see whether/where it actually hangs. |
|
|