[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

Eli Bendersky eliben at gmail.com
Sun Aug 11 02:12:53 CEST 2013


Hello,

Recently as part of the effort of untangling the tests of ElementTree and
general code improvements (e.g. http://bugs.python.org/issue15651), I ran
into something strange about PEP 3121-compliant modules. I'll demonstrate
with csv, just as an example.

PEP 3121 mandates this function to look up the module-specific state in the
current sub-interpreter:

  PyObject* PyState_FindModule(struct PyModuleDef*);

This appears to make the following assumption: a given sub-interpreter only
imports any C extension *once*. If it happens more than once, the
assumption breaks in troubling ways. In normal code, it should never happen
more than once because of the caching in sys.modules; However, many of our
tests monkey-patch sys.modules (mainly by calling
test.support.import_fresh_module) and hell breaks use. Here's a simple
example:

----
import sys

csv = __import__('csv')
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)

print(csv.list_dialects())
# ==> ['unixpwd', 'excel-tab', 'excel', 'unix']

del sys.modules['csv']              # FUN
del sys.modules['_csv']
some_other_csv = __import__('csv')

print(csv.list_dialects())
# ==> ['excel-tab', 'excel', 'unix']
----

Note how doing some sys.modules acrobatics and re-importing suddenly
changes the internal state of a previously imported module. This happens
because:

1. The first import of 'csv' (which then imports `_csv) creates
module-specific state on the heap and associates it with the current
sub-interpreter. The list of dialects, amongst other things, is in that
state.
2. The 'del's wipe 'csv' and '_csv' from the cache.
3. The second import of 'csv' also creates/initializes a new '_csv' module
because it's not in sys.modules. This *replaces* the per-sub-interpreter
cached version of the module's state with the clean state of a new module

So essentially, while PEP 3121 moves state from C-file globals to
per-module state, the state is still global, and this fact can be exposed
from pure Python code.

The above is a toy example. Here's a more serious case I ran into with ET,
but once again is demonstrated with 'csv' for simplicity:

----

import io
from test.support import import_fresh_module

import csv

csv_other = import_fresh_module('csv', fresh=['_csv', 'csv'])

f = io.StringIO('foo\x00,bar\nbaz,42')
reader = csv.reader(f)

try:
    for row in reader:
        print(row)
except csv.Error as e:
    print('Caught csv.error', e)
except Exception as e:
    print('Caught Exception', e)
----

In the above, the reader throws 'csv.Error' (because of the NULL byte) but
the exception clause does not catch it where expected, because it's a
different exception class called `csv.Error`, due to the same problem
demonstrated above (if the seemingly innocent import_fresh_module is
removed, all is good).

Any ideas/suggestion regarding this are welcome. This is quite an esoteric
problem, but I believe it's serious. PEP 3121 is not used much (yet), but
recently there was talk again about committing some of the patches created
for converting Modules/*.c extensions to it during a GSoC project. I
believe that we should understand the implications first. There can be a
number of solutions; including modifying the PEP 3121 implementation
machinery to really create/keep state "per module" and not just "per kind
of module in a single sub-interpreter".

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130810/44261f5e/attachment.html>


More information about the Python-Dev mailing list