msg133990 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-04-18 20:17 |
test_3_join_in_forked_from_thread() of test_threading failed on "x86 Ubuntu Shared 3.x" buildbot: ----------------------------------- [201/354] test_threading [41179 refs] [40407 refs] [40407 refs] [40407 refs] Timeout (1:00:00)! Thread 0x404218c0: File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 466 in _eintr_retry_call File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1486 in _try_wait File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1528 in wait File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 455 in _run_and_join File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 518 in test_3_join_in_forked_from_thread File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 387 in _executeTestPart File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 442 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 494 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1078 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1166 in _run_suite File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1192 in run_unittest File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 728 in test_main File "./Lib/test/regrtest.py", line 1041 in runtest_inner File "./Lib/test/regrtest.py", line 835 in runtest File "./Lib/test/regrtest.py", line 659 in main File "./Lib/test/regrtest.py", line 1619 in make: *** [buildbottest] Error 1 program finished with exit code 2 elapsedTime=4426.776675 http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/3577/steps/test/logs/stdio ----------------------------------- Code of the test: ---------------------------------- def _run_and_join(self, script): script = """if 1: import sys, os, time, threading # a thread, which waits for the main program to terminate def joiningfunc(mainthread): mainthread.join() print('end of thread') # stdout is fully buffered because not a tty, we have to flush # before exit. sys.stdout.flush() \n""" + script p = subprocess.Popen([sys.executable, "-c", script], stdout=subprocess.PIPE) rc = p.wait() <~~~ HANG HERE ~~~~ data = p.stdout.read().decode().replace('\r', '') p.stdout.close() self.assertEqual(data, "end of main\nend of thread\n") self.assertFalse(rc == 2, "interpreter was blocked") self.assertTrue(rc == 0, "Unexpected error") @unittest.skipUnless(hasattr(os, 'fork'), "needs os.fork()") def test_3_join_in_forked_from_thread(self): # Like the test above, but fork() was called from a worker thread # In the forked process, the main Thread object must be marked as stopped. # Skip platforms with known problems forking from a worker thread. # See http://bugs.python.org/issue3863. if sys.platform in ('freebsd4', 'freebsd5', 'freebsd6', 'netbsd5', 'os2emx'): raise unittest.SkipTest('due to known OS bugs on ' + sys.platform) script = """if 1: main_thread = threading.current_thread() def worker(): childpid = os.fork() if childpid != 0: os.waitpid(childpid, 0) sys.exit(0) t = threading.Thread(target=joiningfunc, args=(main_thread,)) print('end of main') t.start() t.join() # Should not block: main_thread is already stopped w = threading.Thread(target=worker) w.start() """ self._run_and_join(script) ---------------------------------- |
|
|
msg139076 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-06-25 14:51 |
test_2_join_in_forked_process fails on FreeBSD 6.4 buildbot. http://www.python.org/dev/buildbot/all/builders/x86 FreeBSD 6.4 3.x/builds/1606/steps/test/logs/stdio """ ====================================================================== FAIL: test_2_join_in_forked_process (test.test_threading.ThreadJoinOnShutdown) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_threading.py", line 464, in test_2_join_in_forked_process self._run_and_join(script) File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_threading.py", line 436, in _run_and_join self.assertEqual(data, "end of main\nend of thread\n") AssertionError: '' != 'end of main\nend of thread\n' + end of main + end of thread """ I think it's the same problem as issue #12316: in the child process, even calling pthread_create can segfault/abort on FreeBSD6 (async-safe blahblah...). Tests creating a thread from a fork()ed process should be skipped on FreeBSD6. Patch attached, along with some refactoring to use the skipIf idiom. As for test_3_join_in_forked_from_thread, well, it could be more or less the same problem. We're really doing something prohibited by POSIX, so things might break in unexpected ways. For example, calling pthread_condition_destroy from the child process can deadlock (see http://bugs.python.org/issue6721#msg136047). Victor: to debug this kind of problem, it would be great if faulthandler could also dump tracebacks of children processes. Do you mind if I create a new issue? |
|
|
msg139121 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-25 21:50 |
> Victor: to debug this kind of problem, it would be great > if faulthandler could also dump tracebacks of children processes. > Do you mind if I create a new issue? Please open a new issue. |
|
|
msg139129 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-25 23:23 |
+ @unittest.skipIf(sys.platform in ('freebsd4', 'freebsd5', 'freebsd6', + 'netbsd5', 'os2emx'), "due to known OS bug") This skip gives very few information, and it is duplicated for each function. I would prefer a constant of the "broken OSes" with your following comment attached to the constant: + # Between fork() and exec(), only async-safe functions are allowed (issues + # #12316 and #11870), and fork() from a worker thread is known to trigger + # problems with some operating systems: skip problematic tests on platforms + # known to behave badly. Or split the test case into two testcases: one using fork and skipped on broken platforms, one not using fork? --- > As for test_3_join_in_forked_from_thread, well, it could be more > or less the same problem. We're really doing something prohibited > by POSIX, so things might break in unexpected ways. If the creation of a thread after a fork is reliable on some systems, we should not deny the creation of new threads after a fork. You can replace "creation of new threads" by any other non async-safe function in my previous sentence. Therefore I agree that the good answer to this issue is to skip the test on "broken systems" (or should we call them "POSIX compliant systems?" :-)). |
|
|
msg139131 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-06-25 23:40 |
> This skip gives very few information, and it is duplicated for each > function. I would prefer a constant of the "broken OSes" with your > following comment attached to the constant: Ok, I'll try to write something along those lines. > If the creation of a thread after a fork is reliable on some systems, > we should not deny the creation of new threads after a fork. Well, the problem is that it is not reliable on any platform, but happens to work "most of the time" on some platforms ;-) |
|
|
msg139219 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-06-26 20:55 |
Patch attached. |
|
|
msg139337 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-27 23:14 |
Your patch is linux3 compliant, go ahead! |
|
|
msg139574 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-07-01 12:53 |
New changeset 0ed5e6ff10f8 by Victor Stinner in branch '3.2': Issue #11870: Skip test_threading.test_2_join_in_forked_process() on platforms http://hg.python.org/cpython/rev/0ed5e6ff10f8 New changeset f43dee86fffd by Victor Stinner in branch 'default': (merge 3.2) Issue #11870: Skip test_threading.test_2_join_in_forked_process() http://hg.python.org/cpython/rev/f43dee86fffd |
|
|
msg139576 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-07-01 13:05 |
New changeset ff36b8cadfd6 by Victor Stinner in branch '2.7': Issue #11870: Skip test_threading.test_2_join_in_forked_process() on platforms http://hg.python.org/cpython/rev/ff36b8cadfd6 |
|
|
msg139583 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-07-01 14:06 |
The initial problem was test_3_join_in_forked_from_thread() and the hangs does still happen: [324/356] test_threading Timeout (1:00:00)! Thread 0x404248c0: File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1498 in _communicate_with_poll File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1423 in _communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 836 in communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 434 in _run_and_join File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 493 in test_3_join_in_forked_from_thread File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 407 in _executeTestPart File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 462 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 514 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/runner.py", line 168 in run File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1259 in _run_suite File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1285 in run_unittest File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 774 in test_main File "./Lib/test/regrtest.py", line 1070 in runtest_inner File "./Lib/test/regrtest.py", line 861 in runtest File "./Lib/test/regrtest.py", line 669 in main File "./Lib/test/regrtest.py", line 1648 in http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4081/steps/test/logs/stdio (neologix's patch doesn't change anything for x86 Ubuntu Shared 3.x buildbot, which is a Linux, not a FreeBSD) I don't know why it only hangs on this Linux buildbot. It's maybe an old Linux kernel, an old GNU libc version, or something like that? |
|
|
msg139602 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-07-01 20:48 |
> The initial problem was test_3_join_in_forked_from_thread() and the hangs does still happen: > Yes, the patch was there to fix test_2_join_in_forked_from_thread. > [324/356] test_threading > Timeout (1:00:00)! > Thread 0x404248c0: > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1498 in _communicate_with_poll > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1423 in _communicate > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 836 in communicate > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 434 in _run_and_join > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 493 in test_3_join_in_forked_from_thread > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 407 in _executeTestPart > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 462 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/case.py", line 514 in __call__ > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 105 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/suite.py", line 67 in __call__ > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/unittest/runner.py", line 168 in run > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1259 in _run_suite > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/support.py", line 1285 in run_unittest > File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 774 in test_main > File "./Lib/test/regrtest.py", line 1070 in runtest_inner > File "./Lib/test/regrtest.py", line 861 in runtest > File "./Lib/test/regrtest.py", line 669 in main > File "./Lib/test/regrtest.py", line 1648 in > > http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4081/steps/test/logs/stdio > This means that the subprocess hangs, but without a backtrace of the child process (issue #12413), it's hard to analyse it further. I've had a look at the code, and couldn't find anything obviously wrong; Gregory's patches to sanitize threading's lock should have fixed this. I also tried running this test in a loop for 48 hours but couldn't reproduce it. One possible explanation (but it's just a wild guess) is that with a certain kernel/libc configuration, the lock deallocation code can deadlock (I've seen pthread_cond_destroy() block, this could maybe happen with sem_destroy()). So I suggest to try to come up with a solution to #12413, which should help analyzing this - and similar - issues. |
|
|
msg148698 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-12-01 00:43 |
> Gregory's patches to sanitize threading's lock should have fixed this The subprocess hang still occurs something, it just happened: http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%203.x/builds/4898/steps/test/logs/stdio [110/363] test_threading Timeout (1:00:00)! Thread 0x404888c0: File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1513 in _communicate_with_poll File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 1438 in _communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/subprocess.py", line 850 in communicate File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 32 in _assert_python File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/script_helper.py", line 50 in assert_python_ok File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 441 in _run_and_join File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_threading.py", line 497 in test_3_join_in_forked_from_thread ... |
|
|
msg148993 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-12-07 21:35 |
I removed my two previous message ( and ), there were unrelated to this issue: the test hangs in debug mode in the IO module because of a deadleak in IO related to the fork... |
|
|
msg148996 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-07 22:04 |
To debug this, we should probably make use of faulthandler (but not dump_tracebacks_later, since it creates a new thread). The way to go could be to make the parent process send a fatal signal to the child process if the latter takes too long to complete. |
|
|
msg149148 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-10 12:02 |
Here's a patch to help nail this down. |
|
|
msg149773 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-18 16:05 |
Victor, could you try the patch attached? |
|
|
msg149784 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-12-18 17:47 |
New changeset 775319cebaa3 by Charles-François Natali in branch '2.7': Issue #11870: threading: Properly reinitialize threads internal locks and http://hg.python.org/cpython/rev/775319cebaa3 New changeset de962ec0faaa by Charles-François Natali in branch '3.2': Issue #11870: threading: Properly reinitialize threads internal locks and http://hg.python.org/cpython/rev/de962ec0faaa New changeset cec0d77d01c4 by Charles-François Natali in branch 'default': Issue #11870: threading: Properly reinitialize threads internal locks and http://hg.python.org/cpython/rev/cec0d77d01c4 |
|
|
msg149788 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-12-18 18:12 |
Should be fixed now. |
|
|
msg180158 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-01-17 22:37 |
New changeset cd54b48946ca by Stefan Krah in branch '3.3': Issue #11870: Skip test_3_join_in_forked_from_thread() on HP-UX. http://hg.python.org/cpython/rev/cd54b48946ca |
|
|