[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) (original) (raw)
Eli Bendersky eliben at gmail.com
Sun Aug 11 02:12:53 CEST 2013
- Previous message: [Python-Dev] xml.etree.ElementTree.IncrementalParser
- Next message: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
Recently as part of the effort of untangling the tests of ElementTree and general code improvements (e.g. http://bugs.python.org/issue15651), I ran into something strange about PEP 3121-compliant modules. I'll demonstrate with csv, just as an example.
PEP 3121 mandates this function to look up the module-specific state in the current sub-interpreter:
PyObject* PyState_FindModule(struct PyModuleDef*);
This appears to make the following assumption: a given sub-interpreter only imports any C extension once. If it happens more than once, the assumption breaks in troubling ways. In normal code, it should never happen more than once because of the caching in sys.modules; However, many of our tests monkey-patch sys.modules (mainly by calling test.support.import_fresh_module) and hell breaks use. Here's a simple example:
import sys
csv = import('csv') csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
print(csv.list_dialects())
==> ['unixpwd', 'excel-tab', 'excel', 'unix']
del sys.modules['csv'] # FUN del sys.modules['_csv'] some_other_csv = import('csv')
print(csv.list_dialects()) # ==> ['excel-tab', 'excel', 'unix']
Note how doing some sys.modules acrobatics and re-importing suddenly changes the internal state of a previously imported module. This happens because:
- The first import of 'csv' (which then imports `_csv) creates module-specific state on the heap and associates it with the current sub-interpreter. The list of dialects, amongst other things, is in that state.
- The 'del's wipe 'csv' and '_csv' from the cache.
- The second import of 'csv' also creates/initializes a new '_csv' module because it's not in sys.modules. This replaces the per-sub-interpreter cached version of the module's state with the clean state of a new module
So essentially, while PEP 3121 moves state from C-file globals to per-module state, the state is still global, and this fact can be exposed from pure Python code.
The above is a toy example. Here's a more serious case I ran into with ET, but once again is demonstrated with 'csv' for simplicity:
import io from test.support import import_fresh_module
import csv
csv_other = import_fresh_module('csv', fresh=['_csv', 'csv'])
f = io.StringIO('foo\x00,bar\nbaz,42') reader = csv.reader(f)
try: for row in reader: print(row) except csv.Error as e: print('Caught csv.error', e) except Exception as e: print('Caught Exception', e)
In the above, the reader throws 'csv.Error' (because of the NULL byte) but
the exception clause does not catch it where expected, because it's a
different exception class called csv.Error
, due to the same problem
demonstrated above (if the seemingly innocent import_fresh_module is
removed, all is good).
Any ideas/suggestion regarding this are welcome. This is quite an esoteric problem, but I believe it's serious. PEP 3121 is not used much (yet), but recently there was talk again about committing some of the patches created for converting Modules/*.c extensions to it during a GSoC project. I believe that we should understand the implications first. There can be a number of solutions; including modifying the PEP 3121 implementation machinery to really create/keep state "per module" and not just "per kind of module in a single sub-interpreter".
Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20130810/44261f5e/attachment.html>
- Previous message: [Python-Dev] xml.etree.ElementTree.IncrementalParser
- Next message: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]