bpo-39349: Add cancel_futures to Executor.shutdown() by aeros · Pull Request #18057 · python/cpython (original) (raw)

Could you test some more scenarios? Like when wait is also true? When called and the interpreter exits.

The current version of the documentation and tests does cover what happens when cancel_futures and wait are both True, since wait is set to True by default. I figured that the vast majority of users would end up calling it as executor.shutdown(cancel_futures=True).

That being said, I think it might be worth elaborating on the behavior when wait=False and cancel_futures=True. The exact behavior varies a bit between executor implementations:

ThreadPoolExecutor - All work items still in the queue (meaning they haven't been assigned to a thread yet) are removed from the queue, and their associated futures are cancelled. This occurs regardless of the value of wait. As for the ones that have been assigned to a thread and are currently running, wait=True allows them to finish.

When wait=False (regardless of cancel_futures), whether or not running futures finish depends on the delay between the executor.shutdown() call and the shutdown of the interpreter. If the interpreter is shut down before the futures finish (terminating the threads w/o joining them), it gets stuck in an indefinite state of running.

The difference between using wait=False and cancel_futures=False vs wait=False and cancel_futures=True is that the pending futures will be cancelled instead of pending until interpreter shutdown (which I believe is prevented until the futures are completed).

ProcessPoolExecutor - Most of the above applies, but underlying details differ a bit. In PPE, the pending work items are not directly accessible in shutdown(), and instead of using a queue it's a dictionary who's items are transferred into a call queue and then removed from the dict. So for the purposes of cancel_futures, any work item that's still in the dict when shutdown() is called gets cancelled (specifically when executor._cancel_pending_work is set to True and after shutdown is detected in _queue_management_worker()).

This results in a bit of delay from when the flag is set to when the pending futures are cancelled, but I think this is the best way to implement cancel_futures for PPE without causing substantial performance losses. IMO, it's preferable to delay cancelling a few pending futures (allowing them to run) and have better overall performance.

Admittedly though, I don't have a strong understanding of the specifics that occur though when wait=False with PPE, mostly because it's quite difficult to examine in real-time. Even just calling executor.shutdown(wait=False) with PPE leads to deadlocks (not even accounting for cancel_futures), at least as of Python 3.7+.

I suspect it has to do with the way non-joined processes are finalized during interpreter shutdown, but that's not an area I'm particularly knowledgeable with. Maybe @pitrou could clarify?

I just found this out while writing an example to demonstrate the above. In my own personal usage of executor.shutdown(), I've never explicitly set wait to False or had a good reason to consider doing so. It's a separate issue, but I think it's worth addressing. I think the deadlock occurs that occurs for PPE will have to be addressed before I can add tests for executor.shutdown(wait=False). Unless they're specifically just added to separate TPE-only tests, but I tried to make the tests fully generic and applicable to both executors.

Examples:

Interaction between wait and cancel_futures: https://gist.github.com/aeros/2e73c8d6dccc94fd863967715826c78d

PPE deadlock demo: https://gist.github.com/aeros/d1ff62b730426584413bca0c8f2ed99d

Edit: I made the current documentation more explicit about what happens when wait=True and cancel_futures=True.