msg138648 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-19 17:25 |
[271/356/1] test_concurrent_futures Traceback (most recent call last): File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 268, in _feed send(obj) File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 229, in send self._send_bytes(memoryview(buf)) File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 423, in _send_bytes self._send(struct.pack("=i", len(buf))) File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 392, in _send n = write(self._handle, buf) OSError: [Errno 32] Broken pipe Timeout (1:00:00)! Thread 0x00000954: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 252 in _feed File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap Thread 0x00000953: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 146 in poll File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 166 in wait File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/process.py", line 150 in join File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 208 in shutdown_worker File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 264 in _queue_management_worker File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap Thread 0x00000001: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 851 in join File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 395 in shutdown File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 67 in tearDown File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 463 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 514 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1166 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1254 in _run_suite File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1280 in run_unittest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 628 in test_main File "./Lib/test/regrtest.py", line 1043 in runtest_inner File "./Lib/test/regrtest.py", line 841 in runtest File "./Lib/test/regrtest.py", line 668 in main File "./Lib/test/regrtest.py", line 1618 in *** Error code 1 make: Fatal error: Command failed for target `buildbottest' program finished with exit code 1 See commit e6e7e42efdc2 of the issue #12310. |
|
|
msg138765 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-20 23:27 |
Message on a stackoverflow thread: "I have suffered from the same problem, even if connecting on localhost in python 2.7.1. After a day of debugging i found the cause and a workaround: Cause: BaseProxy class has thread local storage which caches the connection, which is reused for future connections causing "broken pipe" errors even on creating a new Manager Workaround: Delete the cached connection before reconnecting if address in BaseProxy._address_to_local: del BaseProxy._address_to_local[self.address][0].connection" http://stackoverflow.com/questions/3649458/broken-pipe-when-using-python-multiprocessing-managers-basemanager-syncmanager/5884967#5884967 --- See also maybe the (closed) issue #11663: multiprocessing doesn't detect killed processes |
|
|
msg138766 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-20 23:37 |
Connection._send_bytes() has a comment about broken pipes: def _send_bytes(self, buf): # For wire compatibility with 3.2 and lower n = len(buf) self._send(struct.pack("=i", len(buf))) # The condition is necessary to avoid "broken pipe" errors # when sending a 0-length buffer if the other end closed the pipe. if n > 0: self._send(buf) But the OSError(32, "Broken pipe") occurs on sending the buffer size (a chunk of 4 bytes: self._send(struct.pack("=i", len(buf)))), not on sending the buffer content. See also maybe the (closed) issue #9205: Parent process hanging in multiprocessing if children terminate unexpectedly |
|
|
msg138767 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-21 00:42 |
Ah, submit a new task after the manager shutdown fails with OSError(32, 'Broken pipe'). Example: --------------- from multiprocessing.managers import BaseManager class MathsClass(object): def foo(self): return 42 class MyManager(BaseManager): pass MyManager.register('Maths', MathsClass) if __name__ == '__main__': manager = MyManager() manager.start() maths = manager.Maths() maths.foo() manager.shutdown() try: maths.foo() finally: manager.shutdown() --------------- This example doesn't hang, but this issue is about concurrent.futures, not multiprocessing. |
|
|
msg138768 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-21 01:05 |
Oh, I think that I found a deadlock (or something like that): ---------------------------- import concurrent.futures import faulthandler import os import signal import time def work(n): time.sleep(0.1) def main(): faulthandler.register(signal.SIGUSR1) print("pid: %s" % os.getpid()) with concurrent.futures.ProcessPoolExecutor() as executor: for number, prime in executor.map(work, range(100)): print("shutdown") executor.shutdown() print("shutdown--") if __name__ == '__main__': main() ---------------------------- Trace: ---------------------------- Thread 0x00007fbfc83bd700: File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 252 in _feed File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap Thread 0x00007fbfc8bbe700: File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 101 in put File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 268 in _queue_management_worker File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap Current thread 0x00007fbfcc2e3700: File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 851 in join File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 395 in shutdown File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/_base.py", line 570 in __exit__ File "y.py", line 17 in main File "y.py", line 20 in ---------------------------- There are two child processes, but both are zombies (displayed as "" by ps). Send SIGUSR1 signal to the frozen process to display the traceback (thanks to faulthandler). |
|
|
msg150497 - (view) |
Author: Ross Lagerwall (rosslagerwall)  |
Date: 2012-01-03 06:12 |
Retrieving the result of a future after the executor has been shut down can cause a hang. It seems like this regression was introduced in a76257a99636. This regression exists only for ProcessPoolExecutor. The problem is that even if there are pending work items, the processes are still signaled to shut down leaving the pending work items permanently unfinished. The patch simply removes the call to shut down the processes when there are pending work items. Attached is a patch. |
|
|
msg150498 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-01-03 14:33 |
Well I was sure I had added this code for a reason, but the tests seem to run without... Just a comment: the test isn't ProcessPoolExecutor-specific, so it should really be in the generic tests. |
|
|
msg150851 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-01-08 06:43 |
New changeset 26389e9efa9c by Ross Lagerwall in branch '3.2': Issue #12364: Fix a hang in concurrent.futures.ProcessPoolExecutor. http://hg.python.org/cpython/rev/26389e9efa9c New changeset 25f879011102 by Ross Lagerwall in branch 'default': Merge with 3.2 for #12364. http://hg.python.org/cpython/rev/25f879011102 |
|
|
msg150853 - (view) |
Author: Ross Lagerwall (rosslagerwall)  |
Date: 2012-01-08 09:35 |
Thanks! |
|
|