[New-bugs-announce] [issue17716] IMPORTANT - Process corruption on partly failed imports

Sat Apr 13 18:04:57 CEST 2013

New submission from Pascal Chambon:

Hello,

we've encountered several times a very nasty bug on our framework, several times tests or even production code (served by mod_wsgi) ended up in a broken state, where imports like "from . import processing_exceptions", which were NOT in circular imports and were 100% existing submodules, raised exceptions like "ImportError: cannot import name processing_exceptions". Restarting the test/server fixed it, and we never knew what happened.

I've crossed several forum threads on similar issues, only recently did I find one which gave a way to reproduce the bug:
http://stackoverflow.com/questions/12830901/why-does-import-error-change-to-cannot-import-name-on-the-second-import

So here attached is a python2 sample (python3 has the same pb), showing the bug (just run their test_import.py)

What happens here, is that a package "mypkg" fails to get imported due to an exception (eg. temporarily failuure of DB), but only AFTER successfully importing a submodule mypkg.module_a.
Thus, "mypkg.module_a" IS loaded and stays in sys.modules, but "mypkg" is erased from sys.modules (like the doc on python imports describes it).

The next time we try, from within the same application, to import "mypkg", and we cross "from mypkg import module_a" in the mypkg's __init__.py code, it SEEMS that the import system checks sys.modules, and seeing "mypkg.module_a" in it, it THINKS that necessarily mypkg is already initialized and contains a name "module_a" in its global namespace. Thus the "cannot import name processing_exceptions" error.

Importing "module_a" as an absolute or relative import changes nothing, however doing "import mypkg.module_a" solves the problem (dunno why).

Another workaround is to cleanup sys.modules in mypkg/__init__.py, to ensure that a previously failed attempt at importing the package modules doesn't hinder us.

    # on top of "mypkg/__init__.py"
    exceeding_modules = [k for k in sys.modules.keys() if k.startswith("mypkg.")]
    for k in exceeding_modules:
        del sys.modules[k]

Anyway, I don't know enough python's import internals to understand why, exactly, on second import attempt, the system tries a kind of faulty getattr(mypkg, "module_a"), instead of simply returning sys.modules["mypkg.module_a"] which exists.
Could anyone help with that ? 
That's a very damaging issue, imo, since webserver workers can reach a completely broken state because of that.

PS: more generally, I guess python users lack insight on the behaviour of "from xxx import yyy", especially when yyy is both a real submodule of xxx and a variable initialized in xxx/__init__.py (it seems the real module overrides the variable), or when the __all__ list of xxx could prevent the import of a submodule of xxx by not including it.
Provided I better understand the workflow of all these stuffs - that have quite moved recently I heard - I'd be willing to summarize it for the python docs.

----------
components: Interpreter Core
files: ImportFailPy2.zip
messages: 186738
nosy: Pascal.Chambon
priority: normal
severity: normal
status: open
title: IMPORTANT - Process corruption on partly failed imports
type: behavior
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5
Added file: http://bugs.python.org/file29798/ImportFailPy2.zip

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17716>
_______________________________________