[Python-Dev] Write All New Import Hooks (PEP 302) in Python, Not C

Guido van Rossum guido@python.org
Fri, 27 Dec 2002 17:15:23 -0500


> Let's focus on PEP 302 as a new import hook mechanism, not
> as a way to implement zip imports.  Just's PEP 302 code can
> be used to implement zip imports without making his import
> hooks part of the sys module and thus making them a public
> feature of Python forever.  THIS IS NOT ABOUT ZIP IMPORTS!

Agreed.  It's about making it easier to do things *like* zip import.

> Python already has an import hook, namely __import__.  PEP
> 302 adds three more hooks: sys.path_hooks, sys.meta_path and
> sys.path_importer_hooks.  The idea of four import hooks is
> already fishy.

There's no path_importer_hooks; there's path_importer_cache, but
that's not a new hook, it's a cache for the path_hooks hook.  So I
think the PEP proposes only two new hooks.

Two new hooks is arguably still too much, but the existing __import__
hook is inadequate because it requires you to reimplement too much
functionality (the PEP argues this point convincingly IMO).  So we
need at least one new hook.  Personally, I think sys.meta_path is the
lesser important of the two hooks proposed by the PEP.  It would be
needed if you have to override the standard builtin or frozen import
behavior, but you can already do that with the heavier gun of
overriding __import__.

Other than that, you can hook up arbitrary weird importers by placing
magic-cookie strings in sys.path (e.g. of the form "<blah>") and
registering a path hook that looks for the magic cookie.  (Long ago I
had an idea to do this for builtin and frozen imports too, so you
could control the relative priority of builtin/frozen modules relative
to some directories.  But I never found a use case.  If someone wants
this, though, it's easily added to the path_hooks feature.)

Magic cookie strings are backwards compatible: from inspecting code
that does something with sys.path, it looks like there are many places
that assume sys.path contains only strings, but almost none that
assume that all those strings are valid directory names.  Typical code
uses a sys.path item as input to os.path.isdir() or os.path.join() --
these require strings but don't make other assumptions.  In the end
there's usually something that passes the result to open() or
os.stat() to see if it exists.  Magic cookies will cause this test to
fail, but the code is prepared for such failure -- however it's not
prepared for TypeError coming out of a string concatenation.

> PEP 302 enables non-strings on sys.path, and adds two new
> Python objects "importer" and "loader".  It changes the meaning
> of imp.find_module() and imp.load_module(), and adds a new
> imp.find_module2().  It changes the meaning of __import__.
> It proposes to deprecate __path__ manipulations.

Yes, this is definitely too much.  I'd like to limit this to
implementing sys.path_hooks -- there should be only one way to do it.
We might still want to add one new API to imp, to access the new
module-finding functionality (find_module2 is a poor choice of name
though).

> That is a lot of external changes.  That is a lot of code
> written in C.

Um, last I looked, most of the code written in C was specific to the
zip importer; only a relatively small amount of code was added to
import.c (about 10%).  If we get rid of the meta_path hook, it will be
less; if we drop non-string objects on sys.path, less again.

> I think the proper import hook design is to write Python's
> import mechanism in Python along the lines of Greg's imputil.py
> and Gordon's iu.py.  Import.c would be responsible for flat
> non-package imports from directories and zip files, including
> the import of the real importer iu.py.  The imp module would be
> extended with simple C import utilities that can be used to
> speed up iu.  Once iu.py is imported (probably from site),
> all imports are satisfied using the iu module.

That's a design that I have had in mind long ago, but I don't see it
happening soon, because it would be a much larger overhaul of
import.c.  Also, there are more risks: if import.c somehow can't find
iu.py, it's hosed; and I fear that it could be a significant
performance risk in its first implementation (I vaguely remember that
Greg did some timing tests with imputil.py that confirmed this).

> To provide custom import hooks, the user overrides the iu
> module by writing Python code.  For example, site.py can
> attempt an import of custom_iu before importing iu, and the
> user provides custom_iu.py, probably by copying iu.py.
> I am not sure of the best way to do the override, but I am
> sure it is done in Python code. That enables the user to
> create custom hooks in Python, while relying on the source
> iu.py and the utilities in the imp module for basic
> functionality.

More machinery that's not yet designed and implemented.

I'd like to get something implemented by Monday, that supports zip
import and *some* hookability.  A scaled-down version of Just's code
seems the only realistic possibility.

> If sys.path_hooks and the other hooks described in PEP 302
> are approved, they can be implemented in iu.py in Python.
> 
> This design still requires C support for zip imports, but
> there are two implementations for that available (from JimA
> and Just).  Other problems such as bootstrapping are easily
> solved.
> 
> I am tired of these endless import hook discussions, which
> always seem to start from nifty ideas instead of from a
> formal solicitation of hook requirements.  I don't object to
> useful features, but I don't think anything other than easy
> replacement of the Python import mechanism will bring this
> perennial topic to an end.

I'm not so sure.  In practice, hooks are used for two things: import
from other media than directories (e.g. zip files), and supporting
additional filename extensions that trigger special transformations.
But the latter need is much less common than the former (the only real
example I know of is Quixote) and it's pretty much orthogonal to it.
Neither should require you to replace __import__, as they currently
do.

(I've got to run now -- more over the weekend as my family gives me
some time off. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)