The terminate() method of multiprocessing.Pool hangs sporadically. I could track this issue down to the fact that _handle_results() hangs in the outqueue-cleanup. poll() returned True but get() actually hangs endlessly never returning any data.
To help get an idea of the racing condition I created a trace with several additional debug outputs in pool.py and connection.py "======= Creating new pool =======" marks a new start of a pool using the Pool.imap() function. The iteration is terminated at some time and a new pool is created with a new call to imap(). The last log entry shows the location of the last call which currently hangs endlessly. I am not a multithreading/winapi expert so I could not fix the actual issue with _winapi.WaitForMultipleObjects but the fix in the PR actually ignores preceding issues when polling more data than actually needed for the two sentinels. Remark: I have also seen this issue in linux with the same application but was not able to debug it so far.
Hi, I got bit by this bug last week, I wrote an example that reproduce the basic idea of our program main loop and it hangs - around 20% of the time with a release build of Python 3.7.4 - around 6% of the time with a debug build of Python 3.7, 3.8 and 3.9 With some of our inputs, it hangs nearly all the time but I cannot post them here. I tested PR 8009 and it solves the issue. It seems to me that it is an appropriate fix for this.
I'm also experiencing hanging on terminate. I haven't made a debug build or anything but it's happening to me consistently on 3.8, although I haven't managed to create a small example to reproduce. Replacing pool.py with https://raw.githubusercontent.com/python/cpython/5f6a05bf5b3f7e3c1d805b3bbd8c5ad18f26d933/Lib/multiprocessing/pool.py (from the PR) did not help. So maybe what I'm experiencing is unrelated. It gets stuck on `inqueue._rlock.acquire()` in `Pool._help_stuff_finish`. I've attached debugging info from snoop, maybe that will help.