Issue 12364: Deadlock in test_concurrent_futures (original) (raw)

Created on 2011-06-19 17:25 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue.patch rosslagerwall,2012-01-03 06:12 patch review
itest.py rosslagerwall,2012-01-03 06:12 test program
Messages (9)
msg138648 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-19 17:25
[271/356/1] test_concurrent_futures Traceback (most recent call last): File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 268, in _feed send(obj) File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 229, in send self._send_bytes(memoryview(buf)) File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 423, in _send_bytes self._send(struct.pack("=i", len(buf))) File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 392, in _send n = write(self._handle, buf) OSError: [Errno 32] Broken pipe Timeout (1:00:00)! Thread 0x00000954: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 252 in _feed File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap Thread 0x00000953: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 146 in poll File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 166 in wait File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/process.py", line 150 in join File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 208 in shutdown_worker File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 264 in _queue_management_worker File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap Thread 0x00000001: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 851 in join File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 395 in shutdown File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 67 in tearDown File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 463 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 514 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1166 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1254 in _run_suite File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1280 in run_unittest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 628 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in *** Error code 1 make: Fatal error: Command failed for target `buildbottest' program finished with exit code 1 See commit e6e7e42efdc2 of the issue #12310.
msg138765 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-20 23:27
Message on a stackoverflow thread: "I have suffered from the same problem, even if connecting on localhost in python 2.7.1. After a day of debugging i found the cause and a workaround: Cause: BaseProxy class has thread local storage which caches the connection, which is reused for future connections causing "broken pipe" errors even on creating a new Manager Workaround: Delete the cached connection before reconnecting if address in BaseProxy._address_to_local: del BaseProxy._address_to_local[self.address][0].connection" http://stackoverflow.com/questions/3649458/broken-pipe-when-using-python-multiprocessing-managers-basemanager-syncmanager/5884967#5884967 --- See also maybe the (closed) issue #11663: multiprocessing doesn't detect killed processes
msg138766 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-20 23:37
Connection._send_bytes() has a comment about broken pipes: def _send_bytes(self, buf): # For wire compatibility with 3.2 and lower n = len(buf) self._send(struct.pack("=i", len(buf))) # The condition is necessary to avoid "broken pipe" errors # when sending a 0-length buffer if the other end closed the pipe. if n > 0: self._send(buf) But the OSError(32, "Broken pipe") occurs on sending the buffer size (a chunk of 4 bytes: self._send(struct.pack("=i", len(buf)))), not on sending the buffer content. See also maybe the (closed) issue #9205: Parent process hanging in multiprocessing if children terminate unexpectedly
msg138767 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-21 00:42
Ah, submit a new task after the manager shutdown fails with OSError(32, 'Broken pipe'). Example: --------------- from multiprocessing.managers import BaseManager class MathsClass(object): def foo(self): return 42 class MyManager(BaseManager): pass MyManager.register('Maths', MathsClass) if __name__ == '__main__': manager = MyManager() manager.start() maths = manager.Maths() maths.foo() manager.shutdown() try: maths.foo() finally: manager.shutdown() --------------- This example doesn't hang, but this issue is about concurrent.futures, not multiprocessing.
msg138768 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-21 01:05
Oh, I think that I found a deadlock (or something like that): ---------------------------- import concurrent.futures import faulthandler import os import signal import time def work(n): time.sleep(0.1) def main(): faulthandler.register(signal.SIGUSR1) print("pid: %s" % os.getpid()) with concurrent.futures.ProcessPoolExecutor() as executor: for number, prime in executor.map(work, range(100)): print("shutdown") executor.shutdown() print("shutdown--") if __name__ == '__main__': main() ---------------------------- Trace: ---------------------------- Thread 0x00007fbfc83bd700: File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 252 in _feed File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap Thread 0x00007fbfc8bbe700: File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 101 in put File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 268 in _queue_management_worker File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap Current thread 0x00007fbfcc2e3700: File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 851 in join File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 395 in shutdown File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/_base.py", line 570 in __exit__ File "y.py", line 17 in main File "y.py", line 20 in ---------------------------- There are two child processes, but both are zombies (displayed as "" by ps). Send SIGUSR1 signal to the frozen process to display the traceback (thanks to faulthandler).
msg150497 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012-01-03 06:12
Retrieving the result of a future after the executor has been shut down can cause a hang. It seems like this regression was introduced in a76257a99636. This regression exists only for ProcessPoolExecutor. The problem is that even if there are pending work items, the processes are still signaled to shut down leaving the pending work items permanently unfinished. The patch simply removes the call to shut down the processes when there are pending work items. Attached is a patch.
msg150498 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-03 14:33
Well I was sure I had added this code for a reason, but the tests seem to run without... Just a comment: the test isn't ProcessPoolExecutor-specific, so it should really be in the generic tests.
msg150851 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-01-08 06:43
New changeset 26389e9efa9c by Ross Lagerwall in branch '3.2': Issue #12364: Fix a hang in concurrent.futures.ProcessPoolExecutor. http://hg.python.org/cpython/rev/26389e9efa9c New changeset 25f879011102 by Ross Lagerwall in branch 'default': Merge with 3.2 for #12364. http://hg.python.org/cpython/rev/25f879011102
msg150853 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012-01-08 09:35
Thanks!
History
Date User Action Args
2022-04-11 14:57:18 admin set github: 56573
2012-01-08 09:35:49 rosslagerwall set status: open -> closedtype: behaviormessages: + assignee: rosslagerwallresolution: fixedstage: resolved
2012-01-08 06:43:54 python-dev set nosy: + python-devmessages: +
2012-01-03 14:33:51 pitrou set messages: +
2012-01-03 06:12:54 rosslagerwall set files: + itest.py
2012-01-03 06:12:19 rosslagerwall set files: + issue.patchnosy: + rosslagerwallmessages: + keywords: + patch
2011-07-05 12:22:26 vstinner set title: Timeout (1 hour) in test_concurrent_futures.tearDown() on sparc solaris10 gcc 3.x -> Deadlock in test_concurrent_futures
2011-06-21 01:05:34 vstinner set messages: +
2011-06-21 00:42:34 vstinner set messages: +
2011-06-20 23:37:03 vstinner set messages: +
2011-06-20 23:27:39 vstinner set messages: +
2011-06-19 17:25:15 vstinner create