Unstable tests — Unofficial Python Development (Victor's notes) documentation (original) (raw)

The multiprocessing tests leaked a lot of resources. Victor Stinner and others fixed dozens of bugs in these tests.

How to write reliable tests¶

Don’t use sleep as synchronization¶

Don’t use a sleep as a synchronization primitive between two threads or two processes. It will later, soon or later.

Threads: use threading.Event
Processes: use a pipe (os.pipe()), write a byte when read, read to wait

Don’t limit the maximum duration¶

Don’t make a test fail if it takes longer than a specified number of seconds. Example:

t1 = time.monotonic() func() t2 = time.monotonic() self.assertLess(t2 - t1, 60.0) # cannot happen

Python has buildbot workers which are very slow where “cannot happen” does happen. In most cases, the maximum duration is not a bug in Python and so the test must not fail.

For example, test_time had a test to ensure that time.sleep(0.5) takes less than 0.7 seconds. The test started to fail on slow buildbots where it took 0.8 seconds: maximum extended to 1 second. The test has been modified later to no longer check the maximum duration.

https://bugs.python.org/issue19999#msg206344

Another example, a sleep of 100 ms took 2 seconds on “AMD64 OpenIndiana 3.x” buildbot: https://bugs.python.org/issue20336

Debug race conditions¶

Debug test relying on time.sleep() or asyncio.sleep()¶

For example, test_asyncio: test_run_coroutine_threadsafe_with_timeout() has a race condition issue is caused byawait asyncio.sleep(0.05) used in a test.

To reproduce the race condition, just use the smallest possible sleep of 1 nanosecond:

diff --git a/Lib/test/test_asyncio/test_tasks.py b/Lib/test/test_asyncio/test_tasks.py index dde84b84b1..c94113712a 100644 --- a/Lib/test/test_asyncio/test_tasks.py +++ b/Lib/test/test_asyncio/test_tasks.py @@ -3160,7 +3160,7 @@ class RunCoroutineThreadsafeTests(test_utils.TestCase):

 async def add(self, a, b, fail=False, cancel=False):
     """Wait 0.05 second and return a + b."""

```
   await asyncio.sleep(0.05)
```

   await asyncio.sleep(1e-9)
   if fail:
       raise RuntimeError("Fail!")
   if cancel:

And run the test in a loop until it fails:

./python -m test test_asyncio -m test_run_coroutine_threadsafe_with_timeout -v -F

Debug Dangling process¶

For example, debug test_multiprocessing_spawn which logs:

Warning -- Dangling processes: {<SpawnProcess(QueueManager-1576, stopped)>}

https://bugs.python.org/issue38447

Get cases:

./python -m test test_multiprocessing_spawn --list-cases > cases

Bisect:

./python -m test.bisect_cmd -i cases -o bisect1 -n 5 -N 500 test_multiprocessing_spawn -R 3:3 --fail-env-changed

Debug reap_children() warning¶

For example, test_concurrent_futures logs such warning:

0:27:13 load avg: 4.88 [416/419/1] test_concurrent_futures failed (env changed) (17 min 11 sec) -- running: test_capi (7 min 28 sec), test_gdb (8 min 49 sec), test_asyncio (23 min 23 sec) beginning 6 repetitions 123456 .Warning -- reap_children() reaped child process 26487 ..... Warning -- multiprocessing.process._dangling was modified by test_concurrent_futures Before: set() After: {<weakref at 0x7fdc08f44e30; to 'SpawnProcess' at 0x7fdc0a467c30>}

https://bugs.python.org/issue38448

Run the test in a loop until it fails?

./python -m test test_concurrent_futures --fail-env-changed -F

If it’s not enough, spawn more jobs in parallel, example with 10 processes:

./python -m test test_concurrent_futures --fail-env-changed -F -j10

If it’s not enough, use the previous commands, but also inject some workload. For example, run a different terminal:

./python -m test -u all -r -F -j4

Hack reap_children() to detect more issues, sleep 100 ms before calling waitpid(WNOHANG):

diff --git a/Lib/test/support/init.py b/Lib/test/support/init.py index 0f294c5b0f..d938ae6b16 100644 --- a/Lib/test/support/init.py +++ b/Lib/test/support/init.py @@ -2320,6 +2320,8 @@ def reap_children(): if not (hasattr(os, 'waitpid') and hasattr(os, 'WNOHANG')): return

time.sleep(0.1)

# Reap all our dead child processes so we don't leave zombies around.
# These hog resources and might be causing some of the buildbots to die.
while True:

Untested function which might help, count the number of child processes of a process on Linux: Add support.get_child_processes().

Coredump in multiprocessing¶

FreeBSD buildbot workers were useful to detect crashes at Python exit, bugs related to dangling threads. It helps to add a random sleep at Python exit, inModules/main.c.

Multiprocessing issues¶

Open¶

2018-07-20: multiprocessing.Pool and ThreadPool leak resources after being deleted
2017-07-19: Missing multiprocessing.queues.SimpleQueue.close() method (OPEN).

Fixed, rejected, out of date¶

2018-12-05, multiprocessing: test_multiprocessing_fork: test_del_pool() leaks dangling threads and processes on AMD64 FreeBSD CURRENT Shared 3.x
2018-07-18: test_multiprocessing_spawn: Dangling processes leaked on AMD64 FreeBSD 10.x Shared 3.x
2018-07-03: asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads (FIXED in Python 3.9).
2018-05-28, test_multiprocessing: test_multiprocessing_fork: dangling threads warning(commit: call Pool.join)
2017-07-28: test_multiprocessing_spawn and test_multiprocessing_forkserver leak dangling processes(commit: remove Process.daemon=True, call Process.join)
2017-07-24, multiprocessing: multiprocessing.Pool should join “dead” processes(commit)
2017-07-09, multiprocessing: multiprocessing.Queue.join_thread() does nothing if created and use in the same process(commit)
2017-06-08, multiprocessing: Add close() to multiprocessing.Process
2017-05-03: Emit a ResourceWarning in concurrent.futures executor destructors (OUT OF DATE).
2017-04-26: Emit ResourceWarning in multiprocessing Queue destructor (REJECTED).
2016-04-15, multiprocessing: test_multiprocessing_spawn leaves processes running in background. Add more checks to _test_multiprocessing to detect dangling processes and threads.
2015-11-18, multiprocessing: test_multiprocessing_spawn ResourceWarning with -Werror(commit: use closefd=False)
2011-08-18: Warning – multiprocessing.process._dangling was modified by test_multiprocessing(commit: test_multiprocessing.py calls the terminate() method of all classes).

Python issues¶

Open issues¶

Search for test_asyncio, multiprocessing tests.

2019-06-11: test__xxsubinterpreters fails randomly

Fixed issues¶

2018-05-16, socketserver: socketserver: Add an opt-in option to get Python 3.6 behavior on server_close()
2017-08-18, support: Make support.threading_cleanup() stricter (big issue with many fixes)
2017-08-18, test_logging: test_logging: ResourceWarning: unclosed socket
2017-08-18, socketserver: socketserver.ThreadingMixIn leaks running threads after server_close()
2017-08-09, socketserver: socketserver.ForkingMixIn.server_close() leaks zombie processes

Rejected, Not a Bug, Out of Date¶

2016-03-25: Replace stdout and stderr with simple standard printers at Python exit

Windows handles¶

Abandonned attempt to hunt for leak of Windows handles:

Unlimited recursion¶

Some specific unit tests rely on the exact C stack size and how Python detects stack overflow. These tests are fragile because each platform uses a different stack size and behaves differently on stack overflow. For example, the stack size can depend if Python is compiled using PGO or not (depend on functions inlining).

The support.infinite_recursion() context manager reduces the risk of stack overflow. Example of tests using it:

test_ast
test_exceptions
test_isinstance
test_json
test_pickle
test_traceback
test_tomllib: issue gh-108851

_Py_CheckRecursiveCall() is a portable but not reliable test: basic counter using sys.getrecursionlimit().

MSVC allows to implement PyOS_CheckStack() (USE_STACKCHECK macro is defined) using alloca() and catching STATUS_STACK_OVERFLOW error. If uses _resetstkoflw() to reset the stack overflow flag.

See also Py_C_RECURSION_LIMIT constant.

WASI explicitly sets the stack memory in configure.ac:

dnl gh-117645: Set the memory size to 20 MiB, the stack size to 8 MiB, dnl and move the stack first. dnl https://github.com/WebAssembly/wasi-libc/issues/233 AS_VAR_APPEND([LDFLAGS_NODIST], [" -z stack-size=8388608 -Wl,--stack-first -Wl,--initial-memory=20971520"])

Tests¶

test_pickle: test_bad_getattr()
test_marshal: test_recursion_limit()

History¶

2019-04-29: macOS no longer specify stack size. Previously, it was set to 8 MiB (-Wl,-stack_size,1000000).
- https://github.com/python/cpython/commit/883dfc668f9730b00928730035b5dbd24b9da2a0
- https://bugs.python.org/issue34602
2018-07-05: test_marshal: “Improve tests for the stack overflow in marshal.loads()”
- https://bugs.python.org/issue33720
- https://github.com/python/cpython/commit/fc05e68d8fac70349b7ea17ec14e7e0cfa956121
2018-06-04: test_marshal: “Reduces maximum marshal recursion depth on release builds” on Windows
- https://github.com/python/cpython/commit/2a4a62ba4ae770bbc7b7fdec0760031c83fe1f7b
- https://bugs.python.org/issue33720
2014-11-01: MAX_MARSHAL_STACK_DEPTH sets to 1000 instead of 1500 on Windows
- https://github.com/python/cpython/commit/f6c69e6cc9aac35564a2a2a7ecc43fa8db6da975
- https://bugs.python.org/issue22734
2013-07-07: Visual Studio project (PCbuild) now uses 4.2 MiB stack, instead of 2 MiB
- https://github.com/python/cpython/commit/24e33acf8c422f6b8f84387242ff7874012f7291
- https://bugs.python.org/issue17206
2013-05-30: macOS sets the stack size to 8 MiB
- https://github.com/python/cpython/commit/335ab5b66f432ae3713840ed2403a11c368f5406
- https://bugs.python.org/issue18075
2007-08-29: test_marshal: MAX_MARSHAL_STACK_DEPTH set to 1500 instead of 2000 on Windows for debug build
- https://github.com/python/cpython/commit/991bf5d8c8fdd94c3b9238d7111c0dfb41973804
- https://bugs.python.org/issue1050

Notes¶

On FreeBSD, sudo sysctl -w 'kern.corefile =%N.%P.core' command can be used to include the pid in coredump filenames, since 2 processes can crash at the same time.