Issue 14666: test_sendall_interrupted hangs on FreeBSD with a zombi multiprocessing thread (original) (raw)

Created on 2012-04-24 23:01 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mp_resource_sharer_stop.patch sbt,2012-04-25 00:18 review
mp_resource_sharer_stop.patch sbt,2012-04-25 11:37 review
mp_resource_sharer_stop.patch sbt,2012-04-25 13:03 review
mp_resource_sharer_stop.patch sbt,2012-04-26 15:27 review
Messages (17)
msg159230 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-24 23:01
[233/364] test_multiprocessing ... [265/364] test_typechecks [266/364] test_socket Timeout (1:00:00)! Thread 0x0000000807235000: File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/socket.py", line 135 in accept File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/multiprocessing/connection.py", line 595 in accept File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/multiprocessing/connection.py", line 469 in accept File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/multiprocessing/reduction.py", line 256 in _serve File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 592 in run File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 635 in _bootstrap_inner File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 612 in _bootstrap Thread 0x0000000801407400: File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_socket.py", line 1208 in check_sendall_interrupted File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_socket.py", line 1219 in test_sendall_interrupted File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/case.py", line 385 in _executeTestPart File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/case.py", line 440 in run File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/case.py", line 492 in __call__ File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 105 in run File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 67 in __call__ File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 105 in run File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 67 in __call__ File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/runner.py", line 168 in run File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/support.py", line 1333 in _run_suite File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/support.py", line 1367 in run_unittest File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_socket.py", line 4813 in test_main File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/regrtest.py", line 1237 in runtest_inner File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/regrtest.py", line 907 in runtest File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/regrtest.py", line 710 in main File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/__main__.py", line 13 in File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/runpy.py", line 73 in _run_code File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/runpy.py", line 160 in _run_module_as_main *** Error code 1 http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%209.0%203.x/builds/2339/steps/test/logs/stdio
msg159231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-24 23:26
There was a similar issue: #11753, but it was a bug in the faulthandler module. Here it looks like a bug in TestSocketSharing of test_socket which uses multiprocessing.
msg159232 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-24 23:34
Ah, this is because of the new daemon thread in ResourceSharer. That thread is never stopped and could receive signals while tests expect them to be delivered to the main thread. Either we add a (private?) facility to stop that thread, or we block signal delivery in that thread using the signal module's pthread_sigmask. What do you think?
msg159233 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-24 23:36
The pthread_sigmask() solution would allow the use of multiprocessing all the while keeping deterministic signal delivery.
msg159241 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-25 00:18
This patch adds a ResourceSharer.stop() method. This is called from tearDownClass() in the unittest.
msg159267 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-25 11:37
New version of patch which does signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG)) in the thread (is that right?). It also uses a timeout when trying to join the thread.
msg159271 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 12:14
> in the thread (is that right?). This looks like it. > It also uses a timeout when trying to join the thread. Perhaps some kind of warning can be printed if joining fails after the timeout?
msg159279 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-25 13:03
Warning added to patch.
msg159284 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 13:39
Hmm, I thought either multiprocessing's logging facilities, or the warnings module, could be used. That way, people have a control over verbosity of stderr messages.
msg159286 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-25 13:57
mp_resource_sharer_stop.patch: this patch changes two different things, the patch should be splitted. One patch to fix test_socket. One patch to call pthread_sigmask(). I don't think that you should call pthread_sigmask(). It looks like a workaround for this issue, whereas resource_sharer.stop() is the correct fix.
msg159297 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 15:16
> I don't think that you should call pthread_sigmask(). It looks like a > workaround for this issue, whereas resource_sharer.stop() is the > correct fix. The problem is not only with test_multiprocessing and test_socket; any test which uses multiprocessing could have side effects on any subsequent tests which uses signals. Also, applicative code could be affected. So I think pthread_sigmask() *is* the solution.
msg159321 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-25 17:01
mp_resource_sharer_stop.patch: you should add a timeout argument to stop() instead of hardcoding a timeout of 5 seconds. It is maybe safer to block until the thread exits by default (so timeout=None by default). For the new method: it may be nice to document it. Having to import resource_sharer from multiprocessing.reduction is maybe not the best possible API :-/ + from multiprocessing.reduction import resource_sharer + resource_sharer.stop() > Also, applicative code could be affected. What is the effect of the patch? For example, on CTRL+c? I don't know the multiprocessing module nor this "resource sharer" thread.
msg159323 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 17:25
> For the new method: it may be nice to document it. Having to import > resource_sharer from multiprocessing.reduction is maybe not the best > possible API :-/ resource_sharer is a private API, it's not meant to be used by anyone outside of the stdlib. > What is the effect of the patch? For example, on CTRL+c? Why should it have an effect on CTRL+c? Please explain yourself better. > I don't know the multiprocessing module nor this "resource sharer" > thread. Time to learn about them perhaps :)
msg159382 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-26 15:27
New patch which adds timeout to ResourceSharer.stop() which defaults to 0. When stop() fails it now uses the logger. pthread_sigmask() only stops this background thread from receiving signals. Signals will still be delivered to other threads, so it should not have any effect on the handling of Ctrl-C.
msg159499 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-04-27 21:52
New changeset f163c4731c58 by Antoine Pitrou in branch 'default': Issue #14666: stop multiprocessing's resource-sharing thread after the tests are done. http://hg.python.org/cpython/rev/f163c4731c58
msg159526 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-28 14:38
This should have fixed it. If now, someone reopen the issue :)
msg159537 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-28 20:33
> This should have fixed it. If now, someone reopen the issue :) Thanks!
History
Date User Action Args
2022-04-11 14:57:29 admin set github: 58871
2012-04-28 20:33:54 vstinner set messages: +
2012-04-28 14:38:48 pitrou set status: open -> closedresolution: fixedmessages: + stage: needs patch -> resolved
2012-04-27 21:52:36 python-dev set nosy: + python-devmessages: +
2012-04-26 15:27:53 sbt set files: + mp_resource_sharer_stop.patchmessages: +
2012-04-25 17:25:58 pitrou set messages: +
2012-04-25 17:01:59 vstinner set messages: +
2012-04-25 15:16:54 pitrou set messages: +
2012-04-25 13:57:16 vstinner set messages: +
2012-04-25 13:39:23 pitrou set messages: +
2012-04-25 13:03:02 sbt set files: + mp_resource_sharer_stop.patchmessages: +
2012-04-25 12:14:38 pitrou set messages: +
2012-04-25 11:37:26 sbt set files: + mp_resource_sharer_stop.patchmessages: +
2012-04-25 00:19:00 sbt set files: + mp_resource_sharer_stop.patchkeywords: + patchmessages: +
2012-04-24 23:36:28 pitrou set messages: +
2012-04-24 23:34:53 pitrou set type: behaviorcomponents: + Library (Lib)stage: needs patch
2012-04-24 23:34:18 pitrou set nosy: + sbtmessages: +
2012-04-24 23:26:09 vstinner set messages: +
2012-04-24 23:01:57 vstinner create