[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) (original) (raw)

Eli Bendersky eliben at gmail.com
Sun Aug 11 02:12:53 CEST 2013


Hello,

Recently as part of the effort of untangling the tests of ElementTree and general code improvements (e.g. http://bugs.python.org/issue15651), I ran into something strange about PEP 3121-compliant modules. I'll demonstrate with csv, just as an example.

PEP 3121 mandates this function to look up the module-specific state in the current sub-interpreter:

PyObject* PyState_FindModule(struct PyModuleDef*);

This appears to make the following assumption: a given sub-interpreter only imports any C extension once. If it happens more than once, the assumption breaks in troubling ways. In normal code, it should never happen more than once because of the caching in sys.modules; However, many of our tests monkey-patch sys.modules (mainly by calling test.support.import_fresh_module) and hell breaks use. Here's a simple example:


import sys

csv = import('csv') csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)

print(csv.list_dialects())

==> ['unixpwd', 'excel-tab', 'excel', 'unix']

del sys.modules['csv'] # FUN del sys.modules['_csv'] some_other_csv = import('csv')

print(csv.list_dialects()) # ==> ['excel-tab', 'excel', 'unix']

Note how doing some sys.modules acrobatics and re-importing suddenly changes the internal state of a previously imported module. This happens because:

  1. The first import of 'csv' (which then imports `_csv) creates module-specific state on the heap and associates it with the current sub-interpreter. The list of dialects, amongst other things, is in that state.
  2. The 'del's wipe 'csv' and '_csv' from the cache.
  3. The second import of 'csv' also creates/initializes a new '_csv' module because it's not in sys.modules. This replaces the per-sub-interpreter cached version of the module's state with the clean state of a new module

So essentially, while PEP 3121 moves state from C-file globals to per-module state, the state is still global, and this fact can be exposed from pure Python code.

The above is a toy example. Here's a more serious case I ran into with ET, but once again is demonstrated with 'csv' for simplicity:


import io from test.support import import_fresh_module

import csv

csv_other = import_fresh_module('csv', fresh=['_csv', 'csv'])

f = io.StringIO('foo\x00,bar\nbaz,42') reader = csv.reader(f)

try: for row in reader: print(row) except csv.Error as e: print('Caught csv.error', e) except Exception as e: print('Caught Exception', e)

In the above, the reader throws 'csv.Error' (because of the NULL byte) but the exception clause does not catch it where expected, because it's a different exception class called csv.Error, due to the same problem demonstrated above (if the seemingly innocent import_fresh_module is removed, all is good).

Any ideas/suggestion regarding this are welcome. This is quite an esoteric problem, but I believe it's serious. PEP 3121 is not used much (yet), but recently there was talk again about committing some of the patches created for converting Modules/*.c extensions to it during a GSoC project. I believe that we should understand the implications first. There can be a number of solutions; including modifying the PEP 3121 implementation machinery to really create/keep state "per module" and not just "per kind of module in a single sub-interpreter".

Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20130810/44261f5e/attachment.html>



More information about the Python-Dev mailing list