[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Sun Aug 11 12:33:16 CEST 2013


Hi Eli,

On Sat, 10 Aug 2013 17:12:53 -0700 Eli Bendersky <eliben at gmail.com> wrote:

Note how doing some sys.modules acrobatics and re-importing suddenly changes the internal state of a previously imported module. This happens because: 1. The first import of 'csv' (which then imports `csv) creates module-specific state on the heap and associates it with the current sub-interpreter. The list of dialects, amongst other things, is in that state. 2. The 'del's wipe 'csv' and 'csv' from the cache. 3. The second import of 'csv' also creates/initializes a new 'csv' module because it's not in sys.modules. This replaces the per-sub-interpreter cached version of the module's state with the clean state of a new module

I would say this is pretty much expected. The converse would be a bug IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only subinterpreter support:

"Extension module initialization currently has a few deficiencies. There is no cleanup for modules, the entry point name might give naming conflicts, the entry functions don't follow the usual calling convention, and multiple interpreters are not supported well."

Re-initializing state when importing a module anew makes extension modules more like pure Python modules, which is a good thing.

I think the piece of interpretation you offered yesterday on IRC may be the right explanation for the ET shenanigans:

"Maybe the bug is that ParseError is kept in per-module state, and also exported from the module?"

PEP 3121 doesn't offer any guidelines for using its API, and its example shows PyObject* fields in a module state.

I'm starting to think that it might be a bad use of PEP 3121. PyObjects can, and therefore should be stored in the extension module dict where they will participate in normal resource management (i.e. garbage collection). If they are in the module dict, then they shouldn't be held alive by the module state too, otherwise the (currently tricky) lifetime management of extension modules can produce oddities.

So, the PEP 3121 "module state" pointer (the optional opaque void* thing) should only be used to hold non-PyObjects. PyObjects should go to the module dict, like they do in normal Python modules. Now, the reason our PEP 3121 extension modules abuse the module state pointer to keep PyObjects is two-fold:

  1. it's surprisingly easier (it's actually a one-liner if you don't handle errors - a rather bad thing, but all PEP 3121 extension modules currently don't handle a NULL return from PyState_FindModule...)

  2. it protects the module from any module dict monkeypatching. It's not important if you are using a generic API on the PyObject, but it is if the PyObject is really a custom C type with well-defined fields.

Those two issues can be addressed if we offer an API for it. How about:

PyObject *PyState_GetModuleAttr(struct PyModuleDef *def, const char *name, PyObject *restrict_type)

def is a pointer to the module definition. name is the attribute to look up on the module dict. restrict_type, if non-NULL, is a type object the looked up attribute must be an instance of.

Lookup an attribute in the current interpreter's extension module instance for the module definition def. Returns a new reference (!), or NULL if an error occurred. An error can be:

So code can be written like:

PyObject *dialects = PyState_GetModuleAttr( &_csvmodule, "dialects", &PyDict_Type); if (dialects == NULL) return NULL;

Regards

Antoine.



More information about the Python-Dev mailing list