[Python-Dev] PEP 302 and __path__

Guido van Rossum guido@python.org
Fri, 27 Dec 2002 12:59:52 -0500


I've thought a lot about __path__, and my conclusion is that I want to
keep it.  It was put in as a feature of packages long ago (surviving a
major redesign, when other features were dropped) and IMO serves an
important purpose.  For example, some 3rd party packages (e.g. PMW)
use __path__ to select the most recently installed version when the
package is imported.  I don't like the idea of widening the package
search path by default -- I can see all sorts of confusion when two
versions of a package are on the path, and I prefer the (default)
situation where the first one found hides the second one completely.
That said, there are cases where a widened search path is desirable,
and this can be achieved by explicit __path__ manipulation; that's why
I recently added pkgutil.py (there's a potential use of this in
Zope3).

I also like the idea of allowing references to directories *inside* a
zip archive (e.g. /path/to/foo.zip/dir/subdir/) on sys.path and on
__path__.  This seems pretty natural, and Jython already uses this.
Using a smidgeon of caching, it should be pretty efficient to find the
zip file in the path (even if it doesn't have a .zip extension; I
don't think we should require that).  Of course, there is an issue
when the OS's path separator isn't the same as the zipfile's path
separator; I think we should use the OS's path separator throughout
and translate that to the zipfile's path separator (always "/" I
believe?) internally.  We should also silently strip a leading "/" on
the paths used in the zipfile (i.e., it shouldn't matter if the
zipfile index uses /foo/bar.py or foo/bar.py -- in both cases you
could refer to this as /path/to/foo.zip/foo/bar.py.  For directories,
a trailing slash should also be optional (for files it should be
illegal).

Then __file__ for a package or module loaded from a zipfile should be
set to the a path as above (e.g. /path/to/foo.zip/foo/bar.py) and
__path__ for a package loaded from a zipfile should be initialized to
e.g.  ['/path/to/foo.zip/foo'].

I'm not requiring *all* importers to follow this convention.  It's
fine if e.g. the "freeze" importer does something else (although I
wouldn't be surprised if "freeze" ends up being deprecated once we
have zip import as a standard feature).  But for importers that map to
something reasonably close to a (read-only) hierarchical file system,
it seems useful to use OS-filename-like syntax in __file__ and
__path__.

Now, for such importers, it would be nice if we could use such paths
to extract other data from the importer as well.  I think that the
right API for this would be some function living in the imp module:
you pass it a path and it returns the data as a string, or raises an
IOError (or subclass thereof) instance if it can't find the data.
Let's call this API imp.get_data(filename).  We'll see how it
interfaces to importer/loader objects in a minute.

I also would like to propose a new API to find modules:
imp.get_loader(name[, path]).  This should return a module loader
object, or None if the module isn't found; it should raise an
exception only if something unexpected went wrong.  Once we have a
module loader object, loader.load_module(name[, path]) should load
(and return) the requested module.  The name argument in both cases
should be the fully dotted module name; the path argument should be
omitted or None to load a toplevel module or package (no dot in the
name), and it should be the package path to load a submodule or
subpackage.

Note that the package path can contain multiple entries.
imp.get_loader() will have to probe each in turn until it finds one
that has the requested module, and then return the corresponding
loader.  That loader, when its load_module() method is called, may use
or ignore the path passed in.

I don't want to add a separate meta-path to a package; it seems
overkill, especially since we don't even have a good use case for
meta-path in the normal case.

I'm trying to write up pseudo-code to describe the whole setup more
precisely, but it's taking more time than expected, so I'll send this
mail with my intentions out first.

--Guido van Rossum (home page: http://www.python.org/~guido/)