Issue 12363: test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4 (original) (raw)

Created on 2011-06-19 17:21 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_siginterrupt.patch vstinner,2011-06-19 21:08 review
test_siginterrupt.diff neologix,2011-06-21 21:08 review
Messages (13)
msg138647 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-19 17:21
====================================================================== FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_signal.py", line 396, in test_siginterrupt_on self.assertTrue(i) AssertionError: False is not true http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1586/steps/test/logs/stdio http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1587/steps/test/logs/stdio
msg138652 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-19 17:58
Seen also on OpenSolaris: test test_signal failed -- Traceback (most recent call last): File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_signal.py", line 399, in test_siginterrupt_on self.assertTrue(i) AssertionError: False is not true http://www.python.org/dev/buildbot/all/builders/AMD64%20OpenIndiana%203.x/builds/1443/steps/test/logs/stdio
msg138664 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-19 21:08
I tried to patch the test to use a semaphore, but my patch was not reliable (don't remove completly the race condition). Here is a patch using a subprocess to: - have only one thread - have a timeout on the blocking read (select cannot be used in the test, select always fail with EINTR, the kernel doesn't restart it) - not touch signal handling of the parent process It is also based on time: it uses alarm() to raise a signal in one second, and use an hardcoded timeout of 3 seconds. But it doesn't need tricky synchronization between two processes.
msg138761 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-20 20:51
The patch looks good to me. In the subprocess, why not use the standard 0 exit code in case of success? Also, points 1 and 3 could be handled simply by having the parent process send a signal to the child (but this wouldn't address the timeout issue).
msg138762 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-06-20 21:28
New changeset 968b9ff9a059 by Victor Stinner in branch 'default': Close #12363: fix a race condition in siginterrupt() tests http://hg.python.org/cpython/rev/968b9ff9a059
msg138763 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-20 21:28
> In the subprocess, why not use the standard 0 exit code > in case of success? Something outside my code may exit Python with the code 0. Even if it unlikely, I prefer to use uncommon exit codes, to ensure that the child process executed "correctly" my code. A better check would be to write a specific pattern to stdout, and check stdout, but it would be overkill. > Also, points 1 and 3 could be handled simply by having > the parent process send a signal to the child > (but this wouldn't address the timeout issue). (Hum, points 1 and 3: "have only one thread" and "not touch signal handling of the parent process".) True, but I would like to write a more reliable test, and I don't know how to synchronize two processes for this test case. Because your first sentence was "The patch looks good to me.", let's try this new test in our buildbots.
msg138793 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-21 19:09
The test still fails on FreeBSD 7.2, Tiger and Debian parallel: """ ====================================================================== FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py", line 379, in test_siginterrupt_on self.assertTrue(interrupted) AssertionError: False is not true ====================================================================== FAIL: test_without_siginterrupt (test.test_signal.SiginterruptTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py", line 372, in test_without_siginterrupt self.assertTrue(interrupted) AssertionError: False is not true """ Here's a sample strace output on my Linux box when it fails: 2120 21:00:18.755448 rt_sigaction(SIGALRM, {0x810b332, [], 0}, NULL, 8) = 0 <0.000019> 2120 21:00:18.755690 alarm(1) = 0 <0.000024> 2120 21:00:18.755803 read(3, <unfinished ...> [...] 2120 21:00:19.755811 <... read resumed> 0xb72cad90, 1) = ? ERESTARTSYS (To be restarted) <0.999973> 2120 21:00:19.755864 --- SIGALRM (Alarm clock) @ 0 (0) --- 2120 21:00:19.756094 sigreturn() = ? (mask now []) <0.000019> 2120 21:00:19.756434 alarm(1) = 0 <0.000023> 2120 21:00:19.756547 read(3, <unfinished ...> sigaction is called without SA_RESTART, but the syscall is restarted anyway... Apart from removing those tests, I don't see what we can do here.
msg138803 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-21 21:05
Duh, don't know what I was thinking: the syscall is not restarted (even though ERESTARTSYS is displayed by strace): the real problem is that the 3s timeout to communicate is not enough, because spawning a new interpreter can take a long time (Antoine created an issue some time ago about the high number of syscalls per import). If it takes more than 1s, the test will fail. Also, I guess it's worst on FreeBSD because subprocess.Popen uses close_fds=True by default, and FreeBSD has a huge default _SC_OPEN_MAX. I've attached a patch passing close_fds=False to spawn_python, and running the test only once, because otherwise this would require a timeout quite large.
msg138832 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-06-22 20:16
New changeset aff0a7b0cb12 by Victor Stinner in branch 'default': Issue #12363: improve siginterrupt() tests http://hg.python.org/cpython/rev/aff0a7b0cb12
msg138833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-22 20:20
> Apart from removing those tests, I don't see what we can do here. The previous version of the test rarely failed (only sometimes on the FreeBSD 6.4 buildbox). We may revert my commits to restore the previous test if the new tests fail more often. > the real problem is that the 3s timeout to communicate is not > enough, because spawning a new interpreter can take a long time Right, I added a basic synchronization code to wait until the beginning of the test. Let see if it is better or worse ;-)
msg138861 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-23 09:49
All Python 3.x buildbots are green (except FreeBSD 7.2, but it's not related to this issue). Let close this issue.
msg139579 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-01 13:26
New changeset 8250f04d5a41 by Victor Stinner in branch '3.2': Issue #12363: improve siginterrupt() tests http://hg.python.org/cpython/rev/8250f04d5a41
msg139582 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-01 14:00
New changeset 3f30cfe51315 by Victor Stinner in branch '3.2': Issue #12363: increase the timeout of siginterrupt() tests http://hg.python.org/cpython/rev/3f30cfe51315 New changeset 423268537083 by Victor Stinner in branch 'default': (merge 3.2) Issue #12363: increase the timeout of siginterrupt() tests http://hg.python.org/cpython/rev/423268537083
History
Date User Action Args
2022-04-11 14:57:18 admin set github: 56572
2011-07-01 14:00:02 python-dev set messages: +
2011-07-01 13:26:03 python-dev set messages: +
2011-06-23 09:49:14 vstinner set status: open -> closedmessages: +
2011-06-22 20:20:43 vstinner set messages: +
2011-06-22 20:16:36 python-dev set messages: +
2011-06-21 21:09:23 neologix set status: closed -> open
2011-06-21 21:09:06 neologix set files: - test_siginterrupt.diff
2011-06-21 21:08:52 neologix set files: + test_siginterrupt.diff
2011-06-21 21:05:56 neologix set files: + test_siginterrupt.diffmessages: +
2011-06-21 19:09:54 neologix set messages: +
2011-06-20 21:28:17 vstinner set messages: +
2011-06-20 21:28:05 python-dev set status: open -> closednosy: + python-devmessages: + resolution: fixedstage: resolved
2011-06-20 20:51:44 neologix set messages: +
2011-06-19 21:08:51 vstinner set files: + test_siginterrupt.patchnosy: + pitroumessages: + keywords: + patch
2011-06-19 17:58:00 vstinner set messages: +
2011-06-19 17:21:45 vstinner create