msg138647 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-19 17:21 |
====================================================================== FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_signal.py", line 396, in test_siginterrupt_on self.assertTrue(i) AssertionError: False is not true http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1586/steps/test/logs/stdio http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1587/steps/test/logs/stdio |
|
|
msg138652 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-19 17:58 |
Seen also on OpenSolaris: test test_signal failed -- Traceback (most recent call last): File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_signal.py", line 399, in test_siginterrupt_on self.assertTrue(i) AssertionError: False is not true http://www.python.org/dev/buildbot/all/builders/AMD64%20OpenIndiana%203.x/builds/1443/steps/test/logs/stdio |
|
|
msg138664 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-19 21:08 |
I tried to patch the test to use a semaphore, but my patch was not reliable (don't remove completly the race condition). Here is a patch using a subprocess to: - have only one thread - have a timeout on the blocking read (select cannot be used in the test, select always fail with EINTR, the kernel doesn't restart it) - not touch signal handling of the parent process It is also based on time: it uses alarm() to raise a signal in one second, and use an hardcoded timeout of 3 seconds. But it doesn't need tricky synchronization between two processes. |
|
|
msg138761 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-06-20 20:51 |
The patch looks good to me. In the subprocess, why not use the standard 0 exit code in case of success? Also, points 1 and 3 could be handled simply by having the parent process send a signal to the child (but this wouldn't address the timeout issue). |
|
|
msg138762 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-06-20 21:28 |
New changeset 968b9ff9a059 by Victor Stinner in branch 'default': Close #12363: fix a race condition in siginterrupt() tests http://hg.python.org/cpython/rev/968b9ff9a059 |
|
|
msg138763 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-20 21:28 |
> In the subprocess, why not use the standard 0 exit code > in case of success? Something outside my code may exit Python with the code 0. Even if it unlikely, I prefer to use uncommon exit codes, to ensure that the child process executed "correctly" my code. A better check would be to write a specific pattern to stdout, and check stdout, but it would be overkill. > Also, points 1 and 3 could be handled simply by having > the parent process send a signal to the child > (but this wouldn't address the timeout issue). (Hum, points 1 and 3: "have only one thread" and "not touch signal handling of the parent process".) True, but I would like to write a more reliable test, and I don't know how to synchronize two processes for this test case. Because your first sentence was "The patch looks good to me.", let's try this new test in our buildbots. |
|
|
msg138793 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-06-21 19:09 |
The test still fails on FreeBSD 7.2, Tiger and Debian parallel: """ ====================================================================== FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py", line 379, in test_siginterrupt_on self.assertTrue(interrupted) AssertionError: False is not true ====================================================================== FAIL: test_without_siginterrupt (test.test_signal.SiginterruptTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py", line 372, in test_without_siginterrupt self.assertTrue(interrupted) AssertionError: False is not true """ Here's a sample strace output on my Linux box when it fails: 2120 21:00:18.755448 rt_sigaction(SIGALRM, {0x810b332, [], 0}, NULL, 8) = 0 <0.000019> 2120 21:00:18.755690 alarm(1) = 0 <0.000024> 2120 21:00:18.755803 read(3, <unfinished ...> [...] 2120 21:00:19.755811 <... read resumed> 0xb72cad90, 1) = ? ERESTARTSYS (To be restarted) <0.999973> 2120 21:00:19.755864 --- SIGALRM (Alarm clock) @ 0 (0) --- 2120 21:00:19.756094 sigreturn() = ? (mask now []) <0.000019> 2120 21:00:19.756434 alarm(1) = 0 <0.000023> 2120 21:00:19.756547 read(3, <unfinished ...> sigaction is called without SA_RESTART, but the syscall is restarted anyway... Apart from removing those tests, I don't see what we can do here. |
|
|
msg138803 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2011-06-21 21:05 |
Duh, don't know what I was thinking: the syscall is not restarted (even though ERESTARTSYS is displayed by strace): the real problem is that the 3s timeout to communicate is not enough, because spawning a new interpreter can take a long time (Antoine created an issue some time ago about the high number of syscalls per import). If it takes more than 1s, the test will fail. Also, I guess it's worst on FreeBSD because subprocess.Popen uses close_fds=True by default, and FreeBSD has a huge default _SC_OPEN_MAX. I've attached a patch passing close_fds=False to spawn_python, and running the test only once, because otherwise this would require a timeout quite large. |
|
|
msg138832 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-06-22 20:16 |
New changeset aff0a7b0cb12 by Victor Stinner in branch 'default': Issue #12363: improve siginterrupt() tests http://hg.python.org/cpython/rev/aff0a7b0cb12 |
|
|
msg138833 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-22 20:20 |
> Apart from removing those tests, I don't see what we can do here. The previous version of the test rarely failed (only sometimes on the FreeBSD 6.4 buildbox). We may revert my commits to restore the previous test if the new tests fail more often. > the real problem is that the 3s timeout to communicate is not > enough, because spawning a new interpreter can take a long time Right, I added a basic synchronization code to wait until the beginning of the test. Let see if it is better or worse ;-) |
|
|
msg138861 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-23 09:49 |
All Python 3.x buildbots are green (except FreeBSD 7.2, but it's not related to this issue). Let close this issue. |
|
|
msg139579 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-07-01 13:26 |
New changeset 8250f04d5a41 by Victor Stinner in branch '3.2': Issue #12363: improve siginterrupt() tests http://hg.python.org/cpython/rev/8250f04d5a41 |
|
|
msg139582 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-07-01 14:00 |
New changeset 3f30cfe51315 by Victor Stinner in branch '3.2': Issue #12363: increase the timeout of siginterrupt() tests http://hg.python.org/cpython/rev/3f30cfe51315 New changeset 423268537083 by Victor Stinner in branch 'default': (merge 3.2) Issue #12363: increase the timeout of siginterrupt() tests http://hg.python.org/cpython/rev/423268537083 |
|
|