gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization by jbms · Pull Request #28525 · python/cpython (original) (raw)

Before I dig in to responding: one key point to consider is that Python users (especially extension authors, via the C-API) only interact directly with the global runtime via a few API functions. In nearly every case they are instead interacting with the Python thread (state) or interpreter associated with the current OS thread.

Another key point is that, as of 3.12, each interpreter has its own GIL.

Finally, it is certainly possible that I've misunderstood either the problem you're trying to solve or the way you're trying to solve it or both. I'm completely willing to learn and adjust. Then again, I might be completely right too!

(Sorry for the length of this post. I genuinely want to understand and to be sure we're taking the right approach. I appreciate the work you've done and your willingness to converse.)

Now, on to responses:

It isn't clear to me how PyFinalize_Ex interacts with multiple interpreters. It only seems to finalize the current interpreter.

PyFinalize_Ex() finalizes the main interpreter and the global runtime. At the point it is called, no other interpreters should exist.

Py_EndInterpreter() finalizes any other interpreter, almost entirely in the same way we finalize the main interpreter.

The documentation for Py_EndInterpreter states that the interpreter must "have no other threads".

Please clarify. The only thing I see is: "All thread states associated with this interpreter are destroyed."

The behavior should be the same for all interpreters, whether via Py_EndInterpreter() or the main interpreter via Py_FinalizeEx():

check if current thread and holds GIL and not still running (ensures no reentrancy from the eval loop)
internally mark the interpreter as finalizing
wait for all non-daemon threads to finish
call all of the interpreter's atexit callbacks, if any
externally mark the interpreter as finalizing (this causes daemon threads to die, or, now, pause)
finalize imports
clear the interpreter state (including all its remaining thread states)
delete the interpreter state

Caveats:

Py_FinalizeEx() does a few extra things at various places, but that should not relate to interpreter lifecycle.

I do see that it calls _PyThreadState_DeleteExcept() right after step (5), which Py_EndInterpreter() does not do. However, that's unexpected and should probably be resolved.

There are a few things we don't do in either that we probably should, probably before or right after step (5), e.g. disable the import system, disallow new threads (thread states).

Also, step (3) only applies to threads created by the threading module. We might want to extend that to all other threads states (i.e. created via PyThreadState_New() or PyGILState_Ensure()).

In fact it only does this check after calling the AtExit functions for the interpreter,

Looking at Py_EndInterpreter():

calls wait_for_thread_shutdown() right before _PyAtExit_Call()
publicly marks itself as finalizing right after _PyAtExit_Call()
destroys its remaining thread states at the end during finalize_interp_clear()

So I'm not sure what you mean specifically.

FWIW, Py_FinalizeEx() is exactly the same, except currently it does that last part a little earlier with _PyThreadState_DeleteExcept().

so it seems it would be sufficient to ensure that all other thread states are destroyed before the AtExit functions finish.

Only daemon threads (and, for now, threads (states) created via the C-API) would still be running at that point, and only until next step (5) above.

So are we talking about both the following?

move step (5) before step (4)
apply steps (3) and (5) to thread states that were not created by the threading module

Just to be clear, here are the ways thread states get created:

Py_Initialize*()
Py_NewInterpreter*()
PyThreadState_New()
PyGILState_Ensure()
_thread.start_new_thread() (via threading.Thread.start())

At the moment, it's mostly only with that last one that we are careful during runtime/interp finalization.

It occurs to me that this PR is mostly about addressing that: dealing with other thread states in the same way we currently do threads created via the threading module. Does that sound right?

But there is also the question of what happens if we try to create a thread state while Py_EndInterpreter is still in progress. Py_EndInterpreter doesn't seem to check for other thread states while holding the HEAD_LOCK, but that is not an issue as long as the check does not fail.

Yeah, we should probably be more deliberate about disallowing that sort of thing during finalization.

In general, given the "no other threads" constraint for Py_EndInterpreter it seems that if other non-Python-created or daermon threads hold references to the PyInterpreterState, then some external synchronization mechanism will be needed to ensure that they don't attempt to access the PyInterpreterState once the "no other threads" check completes.

That's what the proposed change in the PR is, AFAICS. The API you're adding must be specific to each interpreter, not to the global runtime. The resources that the proposed change protects are per-interpreter resources, not global ones. So I would not expect there to be any additional API or synchronization mechanism other than what you've already proposed (except applied to each interpreter instead of the main interpreter. Otherwise users of multiple interpreters will still be subject to the problem you're trying to solve.

As an example, suppose we have a C extension that provides a Python API that allows Python callbacks to be passed in, and then later calls those Python functions on its own non-Python-created thread pool. If this extension is to support sub-interpreters, then either during multi-phase module initialization, or when it receives the Python callback, it must record the PyInterpreterState associated with the callback. Then, in order to invoke the callback on a thread from its thread pool, it must obtain a PyThreadState for the (thread, interpreter) combination, creating one if one does not already exist.

That's literally what PyGILState_Ensure() is for and does. 😄

To ensure the PyInterpreterState pointers that it holds remain valid, it would need to register an AtExit function for the interpreter that ensures the PyInterpreterState won't be used. This AtExit function would likely need to essentially implement its own version of the "finalize block" mechanism introduced here.

Why wouldn't we just exclusively use the mechanism you're proposed here? Why would each interpreter have to have an additional duplicate? Again, the resources we're trying to protect here are specific to each interpreter, not to the global runtime, no?

Given the need for external synchronization of threads when calling Py_EndInterpreter, it seems to me that the finalize block mechanism defined by this PR is only useful for the main interpreter.

Hmm, I didn't catch what external synchonization of threads you are talking about. Sorry if I missed it or misunderstood. Please restate what you mean specifically. Thanks!