msg245597 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-21 15:08 |
I'm writing a library on top of multiprocessing. As part of the test suite, I create and destroy dozens of processes repeatedly. About once in 50 runs, the tests complete successfully but the program crashes with: Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 286, in _exit_function _run_finalizers(0) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 246, in _run_finalizers items = [x for x in list(_finalizer_registry.items()) if f(x)] RuntimeError: dictionary changed size during iteration (this is on OS X 10.10.3, Python 3.4.3; I've been unable to replicate the behavior on Python 2.7.9, 3.5.0b2, or pypy despite hundreds of runs) The problematic code is available at https://github.com/jherskovic/mpetl and the tests should be run with nosetests |
|
|
msg245629 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-22 13:47 |
This happens reliably between 1-5% of the time on my home Mac (a 4.0 GHz i7). My work Mac Pro, a lot slower, doesn't exhibit this behavior on the same 3.4.3 interpreter. Could it be related to a concurrency issue in the interpreter? We're attempting to replicate on other systems. |
|
|
msg245687 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-23 13:01 |
More data: I've been unable to replicate this behavior on Ubuntu 15.04 on Python 3.4.3 over ~20 thousand test runs. (Same machine, running in a VM) An overnight repeated run on the original machine on OS X, 3.4.3, official distribution gave an actual frequency of 0.1% (7 crashes in 6980 runs). Approximately 3,300 runs on Python 2.7.9 on the same machine have not resulted in any crashes. I now built 3.4.3 from source using the latest clang and am trying it again, on the same machine, instead of using the official distribution. So far, 0 crashes in approximately 3,300 runs so far. At the moment, then, the problem seems localized to the official 3.4.3 binaries for OS X. |
|
|
msg245704 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2015-06-24 02:14 |
Can you show the version info from the Python 3.4 where you see the failures? /usr/local/bin/python3.4 -c 'import sys;print(sys.version)' |
|
|
msg245705 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2015-06-24 02:54 |
Exactly how are you building the Python 3.4 that does not fail for you? FWIW, I am able to reproduce the failure easily on a 10.10.3 system using python.org 64-bit/32-bit 3.4.3, using a current MacPorts source-built 3.4.3 (which is built with clang), and repeatedly using the current head of the 3.4 branch (post-3.4.3) built with clang, both with and without pydebug enabled. |
|
|
msg245707 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 03:27 |
The failing Python: 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] The non-failing Python: 3.4.3 (default, Jun 23 2015, 06:33:02) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] The non-failing python, I just downloaded the tarball and ran ./configure --prefix=/opt/python343, make, sudo make install, and then created a virtualenv using that binary. Given what you're telling me, I'll run some more loops with the "non-failing" python. I'm grateful you've been able to reproduce the problem, FWIW. |
|
|
msg245713 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2015-06-24 03:52 |
FWIW, building 3.4.3 from source as you described results in failures about 30% of the time. With some of the earlier tests, it failed even more frequently. This is running on a 2.2GHz I7. |
|
|
msg245731 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 11:45 |
20,000+ iterations of the test suite with my homebuilt 3.4.3 throw no exceptions. Weirder and weirder. Any suggestions? Without any actual knowledge ("unencumbered by the thought process"), a concurrency issue in the interpreter itself is my guess. Especially since I can't reproduce it at all on other versions. I'll see if I can come up with a shorter, self-contained test that can illustrate the issue too. |
|
|
msg245738 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 13:17 |
Finally able to repro on my old Mac Pro at work (Universal 64/32bit 3.4.3 from python.org, OS X 10.10.3). For some reason I can't quite fathom, there were two exceptions this time. Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 286, in _exit_function _run_finalizers(0) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 246, in _run_finalizers items = [x for x in list(_finalizer_registry.items()) if f(x)] RuntimeError: dictionary changed size during iteration Exception ignored in: <Finalize object, dead> Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 185, in __call__ File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/queues.py", line 207, in _finalize_close File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 342, in notify TypeError: 'NoneType' object is not callable |
|
|
msg245739 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 13:28 |
Ok, sorry, one more important thing I omitted in the previous comment. ^^^ The second exception above seems to be related to a new test I added, one that includes a daemonic process which itself relies on daemonic threads. The cleanup of that seems to be FUBARed, and it makes the original exception much more likely to pop up as well. |
|
|
msg245810 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-25 15:06 |
Ok, I *think* I tracked it down to Connections getting cleaned up in a non-orderly fashion, sometimes while _run_finalizers is executing. The following patch to lib/python3.4/multiprocessing/connection.py cures the problem on my machine. @@ -124,10 +124,10 @@ self._handle = handle self._readable = readable self._writable = writable - + self._shutdown = util.Finalize(self, self._finalizer) # XXX should we use util.Finalize instead of a __del__? - - def __del__(self): + + def _finalizer(self): if self._handle is not None: self._close() |
|
|
msg250137 - (view) |
Author: Davin Potts (davin) *  |
Date: 2015-09-07 23:09 |
The 'crash' type is reserved for core dump type situations; 'behavior' is more appropriate for the misbehavior your describe. |
|
|
msg294546 - (view) |
Author: Marcin Słowik (Marandil) |
Date: 2017-05-26 11:09 |
I can confirm, I have managed to accidentally reproduce this issue on a custom build of 3.6.0 on RHEL. sys.version: '3.6.0 (default, Jan 18 2017, 11:23:11) [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]' The issue popped out of nowhere, on a code that was working w/out any hiccups on multiple systems; after an insignificant change was made to an unrelated part of code. I managed to "patch it up" by forcingly closing and joining all Queues before letting the main process to stop. Unfortunately, I cannot share the repro code (it's a) too large, b) closed), but I can confirm the issue was still open in 3.6.0. |
|
|
msg295876 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 09:25 |
The simple answer here is _run_finalizers() is not thread-safe, but finalizers can be registered or cancelled from any thread. I could not write a simple synthetic script to reproduce the issue, but it should nevertheless be fixable. |
|
|
msg295880 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 09:32 |
Marcin, Jorge, Ned, could one of you apply the following simple patch and see if that seems to solve the issue? |
|
|
msg295886 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 10:03 |
Ok, I got to write a reproducer, which fails reliably here. |
|
|
msg295927 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:11 |
New changeset 1eb6c0074d17f4fd425cacfdda893d65f5f77f0a by Antoine Pitrou in branch 'master': bpo-24484: Avoid race condition in multiprocessing cleanup (#2159) https://github.com/python/cpython/commit/1eb6c0074d17f4fd425cacfdda893d65f5f77f0a |
|
|
msg295929 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:32 |
Python 2.7 doesn't suffer from this bug, so no actual fix is needed. Still, I'll backport the additional test. |
|
|
msg295931 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:51 |
New changeset 6fd03459957ee53941183212457bba19f977679f by Antoine Pitrou in branch '3.6': [3.6] bpo-24484: Avoid race condition in multiprocessing cleanup (GH-2159) (#2166) https://github.com/python/cpython/commit/6fd03459957ee53941183212457bba19f977679f |
|
|
msg295933 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:52 |
New changeset d09f1674d70dd84c5be37b5fe11bca4913146c6b by Antoine Pitrou in branch '2.7': [2.7] bpo-24484: Avoid race condition in multiprocessing cleanup (GH-2159) (#2168) https://github.com/python/cpython/commit/d09f1674d70dd84c5be37b5fe11bca4913146c6b |
|
|
msg295934 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:54 |
New changeset a0ecaab006849ade08fbcf8413fe1e92d9c8d71c by Antoine Pitrou in branch '3.5': [3.5] bpo-24484: Avoid race condition in multiprocessing cleanup (GH-2159) (#2167) https://github.com/python/cpython/commit/a0ecaab006849ade08fbcf8413fe1e92d9c8d71c |
|
|
msg295935 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:54 |
This should be fixed now. Please reopen if not. |
|
|