msg309244 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-12-30 19:20 |
It seems test_asyncio fails sporadically on AppVeyor: https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.10081#L2650 ====================================================================== ERROR: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\projects\cpython\lib\test\test_asyncio\test_sslproto.py", line 284, in test_start_tls_server_1 asyncio.wait_for(main(), loop=self.loop, timeout=10)) File "C:\projects\cpython\lib\asyncio\base_events.py", line 440, in run_until_complete return future.result() File "C:\projects\cpython\lib\asyncio\tasks.py", line 398, in wait_for raise futures.TimeoutError() concurrent.futures._base.TimeoutError ====================================================================== FAIL: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\projects\cpython\lib\test\test_asyncio\functional.py", line 43, in tearDown self.fail('unexpected calls to loop.call_exception_handler()') AssertionError: unexpected calls to loop.call_exception_handler() |
|
|
msg309254 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-30 20:21 |
I'm leaving on a two-weeks vacation today. To avoid risking breaking the workflow, I'll mask this tests on AppVeyor. I'll investigate this when I get back. |
|
|
msg309255 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-30 20:40 |
New changeset 0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a by Yury Selivanov in branch 'master': bpo-32458: Temporarily mask start-tls proactor test on Windows (#5054) https://github.com/python/cpython/commit/0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a |
|
|
msg309256 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2017-12-30 20:42 |
Please keep this issue open. |
|
|
msg309521 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2018-01-05 20:41 |
After fresh update and debug rebuild on master(3.7), with Win10 patched last night, I got ====================================================================== FAIL: test_call_later (test.test_asyncio.test_events.ProactorEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "F:\dev\3x\lib\test\test_asyncio\test_events.py", line 290, in test_call_later self.assertTrue(0.08 <= t1-t0 <= 0.8, t1-t0) AssertionError: False is not true : 0.07800000000861473 I have not seen this before. Four subsequent runs passed. |
|
|
msg310405 - (view) |
Author: Andrew Svetlov (asvetlov) *  |
Date: 2018-01-22 09:08 |
Hmm, the scheduled call was executed sooner than expected. Actually we had 0.1 sec delay, checked for at least 0.08 but actual call time was 0.078 BTW I recall an issue with Windows when call_later() was executed sooner already. |
|
|
msg313507 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2018-03-09 21:38 |
Note, I think this failure is still happening occasionally on the windows7 buildbot; for example: http://buildbot.python.org/all/#/builders/111/builds/68/steps/3/logs/stdio http://buildbot.python.org/all/#/builders/111/builds/85/steps/3/logs/stdio |
|
|
msg317468 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-23 22:36 |
test_start_tls_server_1() just failed on my Linux. It likely depends on the system load. |
|
|
msg317638 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-24 23:09 |
It would be nice to fix this bug before Python 3.7.0 final: either skip the test, or fix it. Flaky tests can be very annoying. For example, the full test suite is run to build a Red Hat package. If a single test fails, the package build fails and should be retried whereas it's slow. I understood that fixing the root cause might require to rewrite the test, so I don't expect a quick fix on this test. The issue is open since last December... Maybe we can skip the test but fix it in Python 3.7.1? |
|
|
msg317639 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-24 23:10 |
I'm OK to skip it for now. Writing functional tests is super hard because some buildbots are super slow and unpredictable. |
|
|
msg317640 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-24 23:12 |
> I'm OK to skip it for now. Writing functional tests is super hard because some buildbots are super slow and unpredictable. Would you mind to write a PR to skip the PR? So you will feel guilty and will try to remind to fix it! |
|
|
msg317643 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-24 23:23 |
Not at my computer right now, can do it tomorrow. |
|
|
msg317751 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2018-05-26 18:16 |
Yury, are you still planning to address this? |
|
|
msg317755 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-26 18:45 |
Last time I looked into this I couldn't reproduce the failures on my Windows 10 VM, so it seems like an AppVeyor-specific problem. I'll take another look on Monday. Andrew, if you have a Windows environment, could you please try to run this test? |
|
|
msg317758 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2018-05-26 18:49 |
> Last time I looked into this I couldn't reproduce the failures on my Windows 10 VM, so it seems like an AppVeyor-specific problem. I'll take another look on Monday. But in Victor asserts that it sometimes fails on Linux, too? |
|
|
msg317759 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-26 18:54 |
> But in Victor asserts that it sometimes fails on Linux, too? Hm, I missed that. I'll definitely take a look. |
|
|
msg317760 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2018-05-26 19:22 |
It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet. |
|
|
msg317761 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2018-05-26 19:40 |
I have not seen test_asyncio fail on AppVeyor since about Sunday. https://ci.appveyor.com/project/python/cpython/history shows only a couple of failures, which could very well be real. I have seen test_asyncio fail on Travis since Sunday. With my local repository 32-bit debug build of master branch: If I run python -m test.test_asyncio -v, test_sock_sendfile_not_regular_file (test.test_asyncio.test_proactor_events.ProactorEventLoopUnixSockSendfileTests) ... (and the test process) aborts with a failed debug assertion: minkernel\crts\ucrt\src\appcrt\lowio\osfinfo.cpp, line 257 fh >= 0 && (unsigned)fh < (unsigned)_nhandle If I run python -m test test_asyncio, I get F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x052DA4D0> source=self) F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x0476A118> source=self) Exception in callback _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>) handle: <Handle _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>)> Traceback (most recent call last): File "F:\dev\3x\lib\asyncio\events.py", line 88, in _run self._context.run(self._callback, *self._args) File "F:\dev\3x\lib\asyncio\proactor_events.py", line 222, in _loop_reading__data_received self._closing) AssertionError test_asyncio passed in 35 sec 1 test OK. With -v, above is scattered in verbose output, but test_start_tls_server_1 passed both times. Specifically, I ran the following 10 times with no failures. f:\dev\3x>python -m test -v -m test_start_tls_server_1 test_asyncio ... test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLSTests) ... ok test_start_tls_server_1 (test.test_asyncio.test_sslproto.SelectorStartTLSTests) ... ok |
|
|
msg317762 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2018-05-26 19:52 |
I presume the above is using the month-old openssl-bin-1.1.0h. |
|
|
msg317807 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-27 19:40 |
Terry, you somehow deleted Christian from the nosy list. Christian, > It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet. On Linux or Windows? |
|
|
msg317808 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2018-05-27 19:41 |
Linux |
|
|
msg317809 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2018-05-27 20:14 |
'deleted somehow': Christian added himself while I had this page open, so when I submitted without refreshing, the nosy list did not include him. When I saw the red warning, I did not notice the nosy list change; I should have. |
|
|
msg317817 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-27 23:47 |
Christian, I believe https://github.com/python/cpython/pull/7130/commits/bd17a5593f03e3f2d39a64f5afd3ffb77c97e914 fixes the issue with OpenSSL 1.1.1 (socket.shutdown was missing and asyncio's protocol didn't receive EOF, instead the connection was terminated (as expected though)) |
|
|
msg317833 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-28 09:14 |
I just reproduced the issue on Linux. Open 3 terminals and run the commands in parallel: (1) ./python -m test test_asyncio -m test_start_tls_server_1 -F (2) ./python -m test -j16 -r (3) ./python -m test -j16 -r It's a race condition which doesn't depend on the OS, but on the system load. |
|
|
msg317862 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-28 15:08 |
> It's a race condition which doesn't depend on the OS, but on the system load. Yeah, I agree. The current timeout for the test is 5 seconds and it tries to transfer 2mb of data, which isn't that much actually. I'll bump the timeout and reduce the amount of data transferred in https://github.com/python/cpython/pull/7130 Thing is, these are the very first *functional* tests for asyncio. Before it was mostly tested with mocks. So we'll need some adjustment period to learn how slow/loaded the CI servers are and what kind of timeouts are safe to use. |
|
|
msg317871 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-28 15:39 |
> Yeah, I agree. The current timeout for the test is 5 seconds and it tries to transfer 2mb of data, which isn't that much actually. I'll bump the timeout and reduce the amount of data transferred in https://github.com/python/cpython/pull/7130 What is the purpose of a timeout of 5 seconds? Some buildbots are really slow. If the purpose is to detect when the test hangs: use a timeout of 5 minutes, or at least 1 minute (60 seconds). |
|
|
msg317872 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-28 15:40 |
> If the purpose is to detect when the test hangs: use a timeout of 5 minutes, or at least 1 minute (60 seconds). OK, sounds good, will do that. |
|
|
msg317873 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-28 15:50 |
> OK, sounds good, will do that. Race condition! I created https://github.com/python/cpython/pull/7157 to increase the timeout from 5 seconds to 60 seconds. |
|
|
msg317898 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-28 18:33 |
https://github.com/python/cpython/pull/7130 is in. It includes many tests for TLS tests as well as bugfixes to asyncio code. Hopefully this all will make tests more stable. |
|
|
msg317907 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-28 19:49 |
New changeset 8267ea2e84d355f00654dec3ad782fc7b1f680f1 by Yury Selivanov in branch 'master': bpo-32458: Further tune start_tls tests (#7166) https://github.com/python/cpython/commit/8267ea2e84d355f00654dec3ad782fc7b1f680f1 |
|
|
msg317918 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-28 20:41 |
While Yury seems unable to reproduce the bug, it's easy for me to reproduce it on Linux. After 2 hours of debugging, I found the root issue: a race condition not in the test, but in asyncio itself! => bpo-33674 "asyncio: race condition in SSLProtocol". |
|
|
msg317920 - (view) |
Author: miss-islington (miss-islington) |
Date: 2018-05-28 20:44 |
New changeset 4b828467a3fcec0c1947e8326f67b8db12a4f303 by Miss Islington (bot) in branch '3.7': bpo-32458: Further tune start_tls tests (GH-7166) https://github.com/python/cpython/commit/4b828467a3fcec0c1947e8326f67b8db12a4f303 |
|
|
msg318055 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-05-29 16:10 |
Should we close this issue now? |
|
|
msg318080 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-29 20:02 |
I'm not 100% sure that all issues are fixed, but the tests seem much more reliable yet. I close the issue. If the bug reoccurs, I will reopen the issue or open a new one. |
|
|