[Python-Dev] Safe to change a thread's interpreter? (original) (raw)

Phillip J. Eby pje at telecommunity.com
Mon Aug 2 05:52:52 CEST 2004


Recently I've been researching implementation strategies for adding Java classloader-like capabilities to Python. I was pleasantly surprised to find out that CPython already supports multiple interpreters via the C API, where each "interpreter" includes fresh versions of 'sys', 'builtin', etc.

The C API doc for PyInterpreter_New(), however, says:

"""It is possible to insert objects created in one sub-interpreter into a namespace of another sub-interpreter; this should be done with great care to avoid sharing user-defined functions, methods, instances or classes between sub-interpreters, since import operations executed by such objects may affect the wrong (sub-)interpreter's dictionary of loaded modules. (XXX This is a hard-to-fix bug that will be addressed in a future release.)"""

It seems to me that the bug described could be fixed (or at least worked around) by having import temporarily change the 'interp' field of the current thread state to point to the interpreter that the import function lives in. Then, at the end of the import, reset the 'interp' field back to its original value. (Of course, it would also have to fix up the linked lists of the interpreters' thread states during each swap, but that shouldn't be too difficult.)

My question is: does this make sense, or am I completely out in left field here? The only thing I can think of that this would affect is the 'threading' module, in that trying to get the current thread from there (during such an import) might see a foreign interpreter's thread as its own. But, I'm hard-pressed to think of any damage that could possibly cause. Indeed, it seems to me that Python itself doesn't really care how many interpreters or thread states there are running around, and that it only has the linked lists to support "advanced debuggers".

Even if it's undesirable to fix the problem this way in the Python core, would it be acceptable to do so in an extension module?

What I have in mind is to create an extension module that wraps Py_InterpreterState/Py_ThreadState objects up in a subclassable extension type, designed to ensure the integrity of Python as a whole, while still allowing various import-related methods to be overridden in order to implement Java-style classloader hierarchies. So, you might do something like:

 from interpreter import Interpreter

 # Run 'somescript in its own interpreter.
 it = Interpreter()
 exit_code = it.run_main("somescript.py")

 # Release resources without waiting for GC
 it.close()

My thought here also is that performing operations such as running code in a given Interpreter would also operate by swapping the thread state's 'interp' field. Thus, exceptions in the child interpreter would be seamlessly carried through to the parent interpreter.

In order to implement the full Java classloader model, it would also be necessary to be able to force imports not to use the Interpreter that the code doing the import came from. (i.e. the equivalent of using 'java.lang.Thread.setContextClassLoader()'). This can also probably be implemented via a thread-local variable in the 'interpreter' module.

So... must a thread state always reference the same interpreter object? If not, then I think I see a way to safely implement access to multiple interpreters from within Python itself.



More information about the Python-Dev mailing list