[Python-Dev] New and Improved Import Hooks

Moore, Paul Paul.Moore@atosorigin.com
Thu, 5 Dec 2002 09:07:41 -0000


From: Just van Rossum [mailto:just@letterror.com]
> Gordon McMillan wrote:

> > Code like this:
> >  for p in sys.path:
> >    x =3D os.path.join(p, ...)
> >    ....
> > is very common (I patched linecache.py for this after imputil
> > went into the std lib). Since PYTHONPATH can consist only of
> > strings, it seems wise to tackle the issue (dealing with strings
> > that describe non-directory collections of modules) instead of
> > postponing it. Also seems sensible to make it so that if X works
> > on PYTHONPATH, sys.path.append(X) should work, too.
>
> Erm, I'm not sure what you're saying... Are you saying that we
> should fix all cases where non-strings on sys.path cause problems,
> or are you saying that there's so much code out there assuming
> sys.path contains strings, and that we therefore should stick with
> strings?

I read Gordon's comments as the latter, which implies your approach
(A) below.

> Both positions can be defended, and both have their problems.
>
> A) Stick with strings. Hooks can be implemented by subclassing str.
> This is great for hooks written in Python, but subclassing str
> in C is not straightforward. Things can still break, though: eg.
> os.path.basename(strsubinst) will return a regular string, not an
> instance of the subclass; might be an issue.
>
> B) Allow arbitrary objects on sys.path. Hooks are then easier to
> write (in C), but some code breakage will occur. The std library we
> can fix (if needed), but third-party code might break.
>
> I would very much prefer B, but if it turns out that we can't break
> the string assumption, I'd still be happy with A (rather that than
> nothing!).

As I say above, I prefer (A) of these two. But in practice, I don't
see the problem with Gordon's metapath approach. Equivalently, every
element of sys.path must be a string, and there is a dictionary mapping
sys.path elements to Owner instances (if it helps, you can say that if
the dictionary doesn't contain a particular element as a key, it can be
treated as a normal directory). The advantage is that we stick to pure
strings, not string subclasses...

[BTW, at present, sys.path does not seem to support Unicode strings.
This seems like a minor wart. If you allow anything more than strings
on sys.path, I'd suggest that this be tidied up, too...]

> Regarding PYTHONPATH and sys.path.append("/path/to/my/archive.zip"):
> for now I'd suggest that the sys.path traversing code checks
> for a .zip extension, and replace the item with a zipimporter
> instance. This check can be very cheap. Later we could add a general
> extension-checking feature, where one could register an import hook
> for a specific extension. This might be a case of YAGNI, though...

Is this check going to happen whenever sys.path gets changed? If so,
how do you trap that?

And yes, I very definitely need a way of registering a user-defined
hook for path entries (and not always based on extension!!!) Imagining
a hook to handle something like "http://my.repository/python/" on
sys.path is not hard, or unreasonable... (Security considerations =
aside).

Paul.