Issue 32458: test_asyncio: test_start_tls_server_1() fails randomly (original) (raw)

Issue32458

Created on 2017-12-30 19:20 by pitrou, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5054 merged yselivanov,2017-12-30 20:19
PR 7157 closed vstinner,2018-05-28 15:47
PR 7166 merged yselivanov,2018-05-28 19:22
PR 7170 merged miss-islington,2018-05-28 19:50
Messages (34)
msg309244 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-30 19:20
It seems test_asyncio fails sporadically on AppVeyor: https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.10081#L2650 ====================================================================== ERROR: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\projects\cpython\lib\test\test_asyncio\test_sslproto.py", line 284, in test_start_tls_server_1 asyncio.wait_for(main(), loop=self.loop, timeout=10)) File "C:\projects\cpython\lib\asyncio\base_events.py", line 440, in run_until_complete return future.result() File "C:\projects\cpython\lib\asyncio\tasks.py", line 398, in wait_for raise futures.TimeoutError() concurrent.futures._base.TimeoutError ====================================================================== FAIL: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\projects\cpython\lib\test\test_asyncio\functional.py", line 43, in tearDown self.fail('unexpected calls to loop.call_exception_handler()') AssertionError: unexpected calls to loop.call_exception_handler()
msg309254 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-30 20:21
I'm leaving on a two-weeks vacation today. To avoid risking breaking the workflow, I'll mask this tests on AppVeyor. I'll investigate this when I get back.
msg309255 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-30 20:40
New changeset 0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a by Yury Selivanov in branch 'master': bpo-32458: Temporarily mask start-tls proactor test on Windows (#5054) https://github.com/python/cpython/commit/0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a
msg309256 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-30 20:42
Please keep this issue open.
msg309521 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-01-05 20:41
After fresh update and debug rebuild on master(3.7), with Win10 patched last night, I got ====================================================================== FAIL: test_call_later (test.test_asyncio.test_events.ProactorEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "F:\dev\3x\lib\test\test_asyncio\test_events.py", line 290, in test_call_later self.assertTrue(0.08 <= t1-t0 <= 0.8, t1-t0) AssertionError: False is not true : 0.07800000000861473 I have not seen this before. Four subsequent runs passed.
msg310405 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-01-22 09:08
Hmm, the scheduled call was executed sooner than expected. Actually we had 0.1 sec delay, checked for at least 0.08 but actual call time was 0.078 BTW I recall an issue with Windows when call_later() was executed sooner already.
msg313507 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-03-09 21:38
Note, I think this failure is still happening occasionally on the windows7 buildbot; for example: http://buildbot.python.org/all/#/builders/111/builds/68/steps/3/logs/stdio http://buildbot.python.org/all/#/builders/111/builds/85/steps/3/logs/stdio
msg317468 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-23 22:36
test_start_tls_server_1() just failed on my Linux. It likely depends on the system load.
msg317638 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-24 23:09
It would be nice to fix this bug before Python 3.7.0 final: either skip the test, or fix it. Flaky tests can be very annoying. For example, the full test suite is run to build a Red Hat package. If a single test fails, the package build fails and should be retried whereas it's slow. I understood that fixing the root cause might require to rewrite the test, so I don't expect a quick fix on this test. The issue is open since last December... Maybe we can skip the test but fix it in Python 3.7.1?
msg317639 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-24 23:10
I'm OK to skip it for now. Writing functional tests is super hard because some buildbots are super slow and unpredictable.
msg317640 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-24 23:12
> I'm OK to skip it for now. Writing functional tests is super hard because some buildbots are super slow and unpredictable. Would you mind to write a PR to skip the PR? So you will feel guilty and will try to remind to fix it!
msg317643 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-24 23:23
Not at my computer right now, can do it tomorrow.
msg317751 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-26 18:16
Yury, are you still planning to address this?
msg317755 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-26 18:45
Last time I looked into this I couldn't reproduce the failures on my Windows 10 VM, so it seems like an AppVeyor-specific problem. I'll take another look on Monday. Andrew, if you have a Windows environment, could you please try to run this test?
msg317758 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-26 18:49
> Last time I looked into this I couldn't reproduce the failures on my Windows 10 VM, so it seems like an AppVeyor-specific problem. I'll take another look on Monday. But in Victor asserts that it sometimes fails on Linux, too?
msg317759 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-26 18:54
> But in Victor asserts that it sometimes fails on Linux, too? Hm, I missed that. I'll definitely take a look.
msg317760 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-05-26 19:22
It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet.
msg317761 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-05-26 19:40
I have not seen test_asyncio fail on AppVeyor since about Sunday. https://ci.appveyor.com/project/python/cpython/history shows only a couple of failures, which could very well be real. I have seen test_asyncio fail on Travis since Sunday. With my local repository 32-bit debug build of master branch: If I run python -m test.test_asyncio -v, test_sock_sendfile_not_regular_file (test.test_asyncio.test_proactor_events.ProactorEventLoopUnixSockSendfileTests) ... (and the test process) aborts with a failed debug assertion: minkernel\crts\ucrt\src\appcrt\lowio\osfinfo.cpp, line 257 fh >= 0 && (unsigned)fh < (unsigned)_nhandle If I run python -m test test_asyncio, I get F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x052DA4D0> source=self) F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x0476A118> source=self) Exception in callback _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>) handle: <Handle _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>)> Traceback (most recent call last): File "F:\dev\3x\lib\asyncio\events.py", line 88, in _run self._context.run(self._callback, *self._args) File "F:\dev\3x\lib\asyncio\proactor_events.py", line 222, in _loop_reading__data_received self._closing) AssertionError test_asyncio passed in 35 sec 1 test OK. With -v, above is scattered in verbose output, but test_start_tls_server_1 passed both times. Specifically, I ran the following 10 times with no failures. f:\dev\3x>python -m test -v -m test_start_tls_server_1 test_asyncio ... test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLSTests) ... ok test_start_tls_server_1 (test.test_asyncio.test_sslproto.SelectorStartTLSTests) ... ok
msg317762 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-05-26 19:52
I presume the above is using the month-old openssl-bin-1.1.0h.
msg317807 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-27 19:40
Terry, you somehow deleted Christian from the nosy list. Christian, > It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet. On Linux or Windows?
msg317808 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-05-27 19:41
Linux
msg317809 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-05-27 20:14
'deleted somehow': Christian added himself while I had this page open, so when I submitted without refreshing, the nosy list did not include him. When I saw the red warning, I did not notice the nosy list change; I should have.
msg317817 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-27 23:47
Christian, I believe https://github.com/python/cpython/pull/7130/commits/bd17a5593f03e3f2d39a64f5afd3ffb77c97e914 fixes the issue with OpenSSL 1.1.1 (socket.shutdown was missing and asyncio's protocol didn't receive EOF, instead the connection was terminated (as expected though))
msg317833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 09:14
I just reproduced the issue on Linux. Open 3 terminals and run the commands in parallel: (1) ./python -m test test_asyncio -m test_start_tls_server_1 -F (2) ./python -m test -j16 -r (3) ./python -m test -j16 -r It's a race condition which doesn't depend on the OS, but on the system load.
msg317862 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 15:08
> It's a race condition which doesn't depend on the OS, but on the system load. Yeah, I agree. The current timeout for the test is 5 seconds and it tries to transfer 2mb of data, which isn't that much actually. I'll bump the timeout and reduce the amount of data transferred in https://github.com/python/cpython/pull/7130 Thing is, these are the very first *functional* tests for asyncio. Before it was mostly tested with mocks. So we'll need some adjustment period to learn how slow/loaded the CI servers are and what kind of timeouts are safe to use.
msg317871 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 15:39
> Yeah, I agree. The current timeout for the test is 5 seconds and it tries to transfer 2mb of data, which isn't that much actually. I'll bump the timeout and reduce the amount of data transferred in https://github.com/python/cpython/pull/7130 What is the purpose of a timeout of 5 seconds? Some buildbots are really slow. If the purpose is to detect when the test hangs: use a timeout of 5 minutes, or at least 1 minute (60 seconds).
msg317872 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 15:40
> If the purpose is to detect when the test hangs: use a timeout of 5 minutes, or at least 1 minute (60 seconds). OK, sounds good, will do that.
msg317873 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 15:50
> OK, sounds good, will do that. Race condition! I created https://github.com/python/cpython/pull/7157 to increase the timeout from 5 seconds to 60 seconds.
msg317898 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 18:33
https://github.com/python/cpython/pull/7130 is in. It includes many tests for TLS tests as well as bugfixes to asyncio code. Hopefully this all will make tests more stable.
msg317907 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 19:49
New changeset 8267ea2e84d355f00654dec3ad782fc7b1f680f1 by Yury Selivanov in branch 'master': bpo-32458: Further tune start_tls tests (#7166) https://github.com/python/cpython/commit/8267ea2e84d355f00654dec3ad782fc7b1f680f1
msg317918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 20:41
While Yury seems unable to reproduce the bug, it's easy for me to reproduce it on Linux. After 2 hours of debugging, I found the root issue: a race condition not in the test, but in asyncio itself! => bpo-33674 "asyncio: race condition in SSLProtocol".
msg317920 - (view) Author: miss-islington (miss-islington) Date: 2018-05-28 20:44
New changeset 4b828467a3fcec0c1947e8326f67b8db12a4f303 by Miss Islington (bot) in branch '3.7': bpo-32458: Further tune start_tls tests (GH-7166) https://github.com/python/cpython/commit/4b828467a3fcec0c1947e8326f67b8db12a4f303
msg318055 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-29 16:10
Should we close this issue now?
msg318080 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-29 20:02
I'm not 100% sure that all issues are fixed, but the tests seem much more reliable yet. I close the issue. If the bug reoccurs, I will reopen the issue or open a new one.
History
Date User Action Args
2022-04-11 14:58:56 admin set github: 76639
2018-05-29 20:02:20 vstinner set status: open -> closedpriority: deferred blocker -> versions: + Python 3.8messages: + resolution: fixedstage: patch review -> resolved
2018-05-29 16:10:17 yselivanov set messages: +
2018-05-28 20:44:16 miss-islington set nosy: + miss-islingtonmessages: +
2018-05-28 20:41:58 vstinner set messages: +
2018-05-28 19:50:13 miss-islington set pull_requests: + <pull%5Frequest6805>
2018-05-28 19:49:01 yselivanov set messages: +
2018-05-28 19:22:06 yselivanov set pull_requests: + <pull%5Frequest6801>
2018-05-28 18:33:54 yselivanov set messages: +
2018-05-28 15:50:31 giampaolo.rodola set nosy: - giampaolo.rodola
2018-05-28 15:50:06 vstinner set messages: +
2018-05-28 15:47:48 vstinner set pull_requests: + <pull%5Frequest6793>
2018-05-28 15:40:43 yselivanov set messages: +
2018-05-28 15:39:57 vstinner set messages: +
2018-05-28 15:08:48 yselivanov set messages: +
2018-05-28 09:14:56 vstinner set messages: +
2018-05-27 23:47:46 yselivanov set messages: +
2018-05-27 20:14:25 terry.reedy set messages: +
2018-05-27 19:41:12 christian.heimes set messages: +
2018-05-27 19:40:50 yselivanov set nosy: + christian.heimesmessages: +
2018-05-26 19:52:13 terry.reedy set messages: +
2018-05-26 19:40:41 terry.reedy set nosy: - christian.heimesmessages: +
2018-05-26 19:22:06 christian.heimes set nosy: + christian.heimesmessages: +
2018-05-26 18:54:49 yselivanov set messages: +
2018-05-26 18:49:07 ned.deily set messages: +
2018-05-26 18:45:30 yselivanov set messages: +
2018-05-26 18:16:12 ned.deily set messages: +
2018-05-24 23:23:28 yselivanov set messages: +
2018-05-24 23:12:09 vstinner set messages: +
2018-05-24 23:10:31 yselivanov set messages: +
2018-05-24 23:09:33 vstinner set priority: high -> deferred blockermessages: +
2018-05-23 22:39:21 pitrou set nosy: - pitrou
2018-05-23 22:36:38 vstinner set nosy: + vstinnermessages: + title: test_asyncio failures on Windows -> test_asyncio: test_start_tls_server_1() fails randomly
2018-03-09 21:38:33 ned.deily set nosy: + ned.deilymessages: +
2018-01-22 09:08:46 asvetlov set messages: +
2018-01-05 20:41:54 terry.reedy set nosy: + terry.reedymessages: +
2017-12-30 20:42:28 yselivanov set messages: +
2017-12-30 20:40:22 yselivanov set messages: +
2017-12-30 20:21:38 yselivanov set priority: normal -> highassignee: yselivanovmessages: + components: + asyncio, - Library (Lib), Tests
2017-12-30 20:19:43 yselivanov set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest4935>
2017-12-30 19:20:50 pitrou create