msg201225 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2013-10-25 08:06 |
The test seems to be unstable on some build bots: http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%203.x/builds/5271/steps/test/logs/stdio ====================================================================== FAIL: test_interrupted_retry (test.test_selectors.EpollSelectorTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_selectors.py", line 299, in test_interrupted_retry self.assertLess(time() - t, 2.5) AssertionError: 105.86884237639606 not less than 2.5 ---------------------------------------------------------------------- Ran 58 tests in 127.113s |
|
|
msg201237 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-10-25 13:39 |
This failure doesn't make sense: whether SIGALRM is delivered or not, epoll() should time out after 2 seconds, not 105s. What's more, here's another test_subprocess failure: """ ====================================================================== ERROR: test_wait_timeout (test.test_subprocess.ProcessTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_subprocess.py", line 993, in test_wait_timeout self.assertEqual(p.wait(timeout=3), 0) File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/subprocess.py", line 1514, in wait raise TimeoutExpired(self.args, timeout) subprocess.TimeoutExpired: Command '['/var/lib/buildslave/3.x.murray-gentoo-wide/build/python', '-c', 'import time; time.sleep(0.3)']' timed out after 3 seconds """ Even more strange, a test_asyncio failure: """ ====================================================================== FAIL: test_interrupted_retry (test.test_selectors.EpollSelectorTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_selectors.py", line 299, in test_interrupted_retry self.assertLess(time() - t, 2.5) AssertionError: 105.86884237639606 not less than 2.5 """ asyncio uses selectors as backend, so let's admit the test fails, but what's really surprising is that the timeout is *exactly* the same, up to the last digit. Something fishy is going on: David, did something change recently on that buildbot? |
|
|
msg201238 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-10-25 14:05 |
Flaky VM clock perhaps? |
|
|
msg201240 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-10-25 14:44 |
> Flaky VM clock perhaps? Yeah, thought about that, but here we see *exactly* the same value for two time intervals (using time.monotonic()), computed over different test suite runs. I'd expected something more erratic... |
|
|
msg201241 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-10-25 14:49 |
The only change on the buildbot for some months now was increasing the size of the in-memory /tmp filesystem, which I did yesterday. That also involved a reboot, though, and it is conceivable that there were software changes made since the last reboot and before this one. I don't *think* there were, but I can't promise it. No kernel changes, though, for sure. |
|
|
msg201242 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-10-25 15:22 |
Could you try to run the test in a loop under strace ("strace -ttTf") ? |
|
|
msg201678 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-10-29 21:35 |
Here's another failure: """ ====================================================================== FAIL: test_interpreter_shutdown (test.test_concurrent_futures.ProcessPoolShutdownTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_concurrent_futures.py", line 77, in tearDown self.assertLess(dt, 60, "synchronization issue: test lasted too long") AssertionError: 108.88635897636414 not less than 60 : synchronization issue: test lasted too long """ Many timeout-related tests are failing randomly on this buildbot, so I assume that's really a buildbot issue (is it a VM?). |
|
|
msg201679 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-10-29 21:45 |
Yes, it is a linux vserver VM. I would think we would want tests to not fail in VMs, though, in this day and age. |
|
|
msg201681 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-10-29 21:47 |
Also, unless people haven't been reporting other errors, these buildbots have been stable up until recently, with the exception of the multiprocessing hangs that plague all the buildbots as far as I know. (My impression could be mistaken, though.) |
|
|
msg201682 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-10-29 21:55 |
> Also, unless people haven't been reporting other errors, these buildbots have been stable up until recently Well, the fact that several tests - which were working just fine - started failing randomly recently (like test_concurrent_futures, test_subprocess) is a strong hint that the culprit is the buildbot (I mean, the test_selectors failure just calls epoll_wait(): there is a known overflow for epoll_wait() timeout, but for large timeout values, not 2s). This wouldn't be the first time I see clock issues on a virtual machine... |
|
|
msg201685 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-10-29 22:13 |
What could cause a VM clock problem on a machine that hasn't been modified, but the VMs have been rebooted recently (ie: it isn't stale state inside the VM itself). Anything you can suggest I look at? |
|
|
msg201687 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-10-29 22:20 |
I'm going to reboot the host just to see if that makes any difference. It's been up for 105 days. |
|
|
msg203652 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-11-21 16:21 |
Looking at the current buildbot history I don't see these errors any more. I'm not sure how frequent they were, but I think I'll mark this as resolved and we can reopen it if we see it again. |
|
|