Issue 38564: test_asyncio: test_run_coroutine_threadsafe_with_timeout() has a race condition (original) (raw)

Created on 2019-10-23 14:44 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg355223 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-23 14:44
AMD64 FreeBSD Shared 3.x: https://buildbot.python.org/all/#/builders/371/builds/7 This buildbot worker is known to be very slow. test_run_coroutine_threadsafe_with_timeout (test.test_asyncio.test_tasks.RunCoroutineThreadsafeTests) Test coroutine submission from a thread to an event loop ... FAIL FAIL: test_run_coroutine_threadsafe_with_timeout (test.test_asyncio.test_tasks.RunCoroutineThreadsafeTests) Test coroutine submission from a thread to an event loop ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/python/3.x.koobs-freebsd-564d/build/Lib/test/test_asyncio/test_tasks.py", line 3210, in test_run_coroutine_threadsafe_with_timeout self.loop.run_until_complete(future) AssertionError: TimeoutError not raised
msg355227 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-23 15:01
test_run_coroutine_threadsafe_with_timeout() has a race condition. It relies on "await asyncio.sleep(0.05)" synchronization primitive. That's not reliable. async def add(self, a, b, fail=False, cancel=False): """Wait 0.05 second and return a + b.""" await asyncio.sleep(0.05) if fail: raise RuntimeError("Fail!") if cancel: asyncio.current_task(self.loop).cancel() await asyncio.sleep(0) return a + b
msg355228 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-23 15:03
I can reproduce the issue with this patch: diff --git a/Lib/test/test_asyncio/test_tasks.py b/Lib/test/test_asyncio/test_tasks.py index dde84b84b1..c94113712a 100644 --- a/Lib/test/test_asyncio/test_tasks.py +++ b/Lib/test/test_asyncio/test_tasks.py @@ -3160,7 +3160,7 @@ class RunCoroutineThreadsafeTests(test_utils.TestCase): async def add(self, a, b, fail=False, cancel=False): """Wait 0.05 second and return a + b.""" - await asyncio.sleep(0.05) + await asyncio.sleep(1e-9) if fail: raise RuntimeError("Fail!") if cancel: and the command: ./python -m test test_asyncio -m test_run_coroutine_threadsafe_with_timeout -v -F
msg355360 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019-10-25 11:53
I can confirm Victor's method of reproducing the failure consistently, by using asyncio.sleep(1e-9) within `RunCoroutineThreadsafeTests.add()` instead of the current asyncio.sleep(0.05). I also experimented with adjusting the sleep time, to figure out the "breaking point" where I could no longer run `./python -m test test_asyncio -m test_run_coroutine_threadsafe_with_timeout -v -F` without failures (within ~10,000 tests). From my results, the lowest reliable value was 0.001. At 1e-4, I was able to consistently reproduce the failure reported above: FAIL: test_run_coroutine_threadsafe_with_timeout (test.test_asyncio.test_tasks.RunCoroutineThreadsafeTests) Test coroutine submission from a thread to an event loop ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_tasks.py", line 3210, in test_run_coroutine_threadsafe_with_timeout self.loop.run_until_complete(future) AssertionError: TimeoutError not raised The failure becomes increasingly consistent with lowered time as expected, but at 1e-5 I was able to repeatedly reproduce the failure within 10 iterations of the test. At 1e-4 it took around 3000 iterations before failing (across multiple runs).
msg355361 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-25 12:15
IHMO a test must not depend on time.
msg355365 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019-10-25 12:53
> IHMO a test must not depend on time. I would agree, the tests should optimally not depend on time. My above comment was primarily to confirm that the failure was able to be consistently reproduced, along with the minimum conditions to do so. Yury had been the one to change it to "asyncio.sleep(0.05)" from the previous "asyncio.sleep(1)" in the following commit: https://github.com/python/cpython/commit/abe9625eeb71e40f042ccfccfe6a4489a6dcdf35 (Nov 13, 2015). Perhaps he might have some insight or ideas as to how we could improve the test to use a more reliable means of synchronization that has been implemented since that change was made. I'll add him to the nosy list.
History
Date User Action Args
2022-04-11 14:59:22 admin set github: 82745
2021-09-21 22:28:10 vstinner set status: open -> closedresolution: out of datestage: resolved
2019-10-25 12:53:20 aeros set nosy: + yselivanovmessages: +
2019-10-25 12:15:57 vstinner set messages: +
2019-10-25 11:53:45 aeros set nosy: + aerosmessages: +
2019-10-23 15:12:43 vstinner set title: test_asyncio: test_run_coroutine_threadsafe_with_timeout() -> test_asyncio: test_run_coroutine_threadsafe_with_timeout() has a race condition
2019-10-23 15:03:54 vstinner set messages: +
2019-10-23 15:01:22 vstinner set messages: +
2019-10-23 14:44:20 vstinner create