cpython (original) (raw)

Typically each PyModuleDef for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating each PyModuleDef is worth doing, especially if you consider we've already run into problems1 because of m_copy.

The main focus here is on PyModuleDef.m_base.m_copy2 specifically. It's the state that facilitates importing legacy (single-phase init) extension/builtin modules that do not support repeated initialization3 (likely the vast majority).

(expand for more context)

PyModuleDef for an extension/builtin module is usually stored in a static variable and (with immortal objects, see gh-101755) is mostly immutable. The exception is m_copy, which is problematic in some cases for modules imported in multiple interpreters.

Note that m_copy is only relevant for legacy (single-phase init) modules, whether builtin and an extension, and only if the module does not support repeated initialization3. It is never relevant for multi-phase init (PEP 489) modules.

initialization
- m_copy is only set by _PyImport_FixupExtensionObject() (and thus indirectly _PyImport_FixupBuiltin() and _imp.create_builtin())
- _PyImport_FixupExtensionObject() is called by _PyImport_LoadDynamicModuleWithSpec()` when a legacy (single-phase init) extension module is loaded
usage
- m_copy is only used in import_find_extension(), which is only called by _imp.create_builtin() and _imp.create_dynamic() (via the respective importers)

When such a legacy module is imported for the first time, m_copy is set to a new copy of the just-imported module's __dict__, which is "owned" by the current interpreter (the one importing the module). Whenever the module is loaded again (e.g. reloaded or deleted from sys.modules and then imported), a new empty module is created and m_copy is [shallow] copied into that object's __dict__.

When m_copy is originally initialized, normally that will be the first time the module is imported. However, that code can be triggered multiple times for that module if it is imported under a different name (an unlikely case but apparently a real one). In that case the m_copy from the previous import is replaced with the new one right after it is released (decref'ed). This isn't the ideal approach but it's also been the behavior for quite a while.

The tricky problem here is that the same code is triggered for each interpreter that imports the legacy module. Things are fine when a module is imported for the first time in any interpreter. However, currently, any subsequent import of that module in another interpreter will trigger that replacing code. The second interpreter decref's the old m_copy, but that object is "owned" by the first interpreter. This is a problem1.

Furthermore, even if the decref-in-the-wrong-interpreter problem was gone. When m_copy is copied into the new module's __dict__ on subsequent imports, it's only a shallow copy. Thus such a legacy module, imported in other interpreters than the first one, would end up with its __dict__ filled with objects not owned by the correct interpreter.

Here are some possible approaches to isolating each module's PyModuleDef to the interpreter that imports it:

keep a copy of PyModuleDef for each interpreter (would _PyRuntimeState.imports.extensions need to move to the interpreter?)
keep just m_copy for/on each interpreter
fix _PyImport_FixupExtensionObject() some other way...

Linked PRs

see https://github.com/python/cpython/pull/101660#issuecomment-1424507393 ↩ ↩2
We should probably consider isolating PyModuleDef.m_base.m_index, but for now we simply sync the modules_by_index list of each interpreter. (Also, modules_by_index and m_index are only used for single-phase init modules.) ↩
specifically def->m_size == -1; multi-phase init modules always have def->m_size >= 0; single-phase init modules can also have a non-negative m_size ↩ ↩2