Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules · Issue #101758 · python/cpython (original) (raw)
Typically each PyModuleDef
for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating each PyModuleDef
is worth doing, especially if you consider we've already run into problems1 because of m_copy
.
The main focus here is on PyModuleDef.m_base.m_copy
2 specifically. It's the state that facilitates importing legacy (single-phase init) extension/builtin modules that do not support repeated initialization3 (likely the vast majority).
(expand for more context)
PyModuleDef
for an extension/builtin module is usually stored in a static variable and (with immortal objects, see gh-101755) is mostly immutable. The exception is m_copy
, which is problematic in some cases for modules imported in multiple interpreters.
Note that m_copy
is only relevant for legacy (single-phase init) modules, whether builtin and an extension, and only if the module does not support repeated initialization3. It is never relevant for multi-phase init (PEP 489) modules.
- initialization
m_copy
is only set by_PyImport_FixupExtensionObject()
(and thus indirectly_PyImport_FixupBuiltin()
and_imp.create_builtin()
)_PyImport_FixupExtensionObject() is called by
_PyImport_LoadDynamicModuleWithSpec()` when a legacy (single-phase init) extension module is loaded
- usage
m_copy
is only used inimport_find_extension()
, which is only called by_imp.create_builtin()
and_imp.create_dynamic()
(via the respective importers)
When such a legacy module is imported for the first time, m_copy
is set to a new copy of the just-imported module's __dict__
, which is "owned" by the current interpreter (the one importing the module). Whenever the module is loaded again (e.g. reloaded or deleted from sys.modules
and then imported), a new empty module is created and m_copy
is [shallow] copied into that object's __dict__
.
When m_copy
is originally initialized, normally that will be the first time the module is imported. However, that code can be triggered multiple times for that module if it is imported under a different name (an unlikely case but apparently a real one). In that case the m_copy
from the previous import is replaced with the new one right after it is released (decref'ed). This isn't the ideal approach but it's also been the behavior for quite a while.
The tricky problem here is that the same code is triggered for each interpreter that imports the legacy module. Things are fine when a module is imported for the first time in any interpreter. However, currently, any subsequent import of that module in another interpreter will trigger that replacing code. The second interpreter decref's the old m_copy
, but that object is "owned" by the first interpreter. This is a problem1.
Furthermore, even if the decref-in-the-wrong-interpreter problem was gone. When m_copy
is copied into the new module's __dict__
on subsequent imports, it's only a shallow copy. Thus such a legacy module, imported in other interpreters than the first one, would end up with its __dict__
filled with objects not owned by the correct interpreter.
Here are some possible approaches to isolating each module's PyModuleDef
to the interpreter that imports it:
- keep a copy of
PyModuleDef
for each interpreter (would_PyRuntimeState.imports.extensions
need to move to the interpreter?) - keep just
m_copy
for/on each interpreter - fix
_PyImport_FixupExtensionObject()
some other way...
Linked PRs
- gh-101758: Add a Test For Single-Phase Init Module Variants #101891
- gh-101758: Clean Up Uses of Import State #101919
- gh-101758: Add a Test For Single-Phase Init Modules in Multiple Interpreters #101920
- gh-101758: Fix the wasm Builtbots #101943
- gh-101758: Add _PyState_AddModule() Back for the Stable ABI #101956
- gh-101758: Fix Refleak Testing With test_singlephase_variants #101969
- see https://github.com/python/cpython/pull/101660#issuecomment-1424507393 ↩ ↩2
- We should probably consider isolating
PyModuleDef.m_base.m_index
, but for now we simply sync themodules_by_index
list of each interpreter. (Also,modules_by_index
andm_index
are only used for single-phase init modules.) ↩ - specifically
def->m_size == -1
; multi-phase init modules always havedef->m_size >= 0
; single-phase init modules can also have a non-negativem_size
↩ ↩2