[Python-Dev] Import redesign [LONG]

Greg Stein gstein@lyra.org
Mon, 6 Dec 1999 16:43:21 -0800 (PST)


On Mon, 6 Dec 1999, James C. Ahlstrom wrote:
> Greg Stein wrote:
>...
> > I am not following this. What/where is the "single dictionary of module
> > names" ? Are you referring to a cache? Or is this about building an
> > archive?
> > 
> > An archive would look just like we have now: map a name to a module. It
> > would not need multiple dictionaries.
> 
> The "single dictionary of names" is in the single archive importer
> instance and has nothing to do with creating the archive.  It
> is currently programmed this way.

Ah. There is the problem. In Guido's suggestion for the "next path of
inquiry" :-), there is no "single dictionary of names". Instead, you have
Importer instances as items in sys.path. Each instance maintains its
dictionary, and they are not (necessarily) combined.

If we were to combine them, then we would need to maintain the ordering
requirements implied by sys.path. However, this would be problematic if
sys.path changed -- we would have to detect the situation and rebuild a
merged dict.

> Suppose the user specifies by name 12 archive files to be searched.
> That is, the user hacks site.py to add archive names to the importer.
> The "single dictionary" means that the archive importer takes the 12
> dictionaries in the 12 files and merges them together into one
> dictionary
> in order to speed up the search for a name.  The good news is you can
> always just call the archive importer to get a module.  The bad news is
> you can't do that for each entry on sys.path because there is no
> necessary identity between archive files and sys.path.  The user
> specified the archive files by name, and they may or may not be on
> sys.path, and the user may or may not have specified them in the
> same order as sys.path even if they are.

The importer must be inserted into sys.path to establish a precedence. If
the user wants to add 12 libraries... fine. But *all* of those modules
will fall under a precedence defined by the Importer's position on
sys.path.

> Suppose archive files must lie on sys.path and are processed in order.
> Then to find them you must know their name.  But IMHO you want to
> avoid doing a readdir() on each element of sys.path and looking for
> files *.pyl.

I do not believe that we will arbitrarily locate and open library files.
They must be specified explicitly.

> Suppose archive file names in general are the known name "lib.pyl"
> for the Python library, plus the names "package.pyl" where "package"
> can be the name of a Python package as a single archive file.  Then
> if the user tries to import foo, imputil will search along sys.path
> looking for foo.pyc, foo.pyl, etc.  If it finds foo.pyl, the archive
> importer will add it to its list of known archive files.  But it must
> not add it to its single dictionary, because that would destroy the
> information about its position along sys.path.  Instead, it must keep
> a separate dictionary for each element of sys.path and search the
> separate dictionaries under control of imputil.  That is, get_code()
> needs a new argument for the element of sys.path being searched.
> Alternatively, you could create a new importer instance for each
> archive file found, but then you still have multiple dictionaries.
> They are in the multiple instances.

If the user installs ".pyl" as a recognized extension (i.e. installs into
the PathImporter), then the above scenario is possible. In my
in-head-design, I had not imagined any state being retained for
extension-recognizer hooks. Of course, state can be retained simply by
using a bound-method for the hook function.

get_code() would not need to change. The foo.pyl would be consulted at the
appropriate time based on where it is found in sys.path. Note that file-
extension hooks would definitely have a complete path to the target file.
Those are not Importers, however (although they will closely follow the
get_code() hook since the extension is called from get_code).

From a pure theoretical standpoint, you can also see that get_code()
should not have a pathname passed -- that would introduce filesystem
semantics into what is otherwise an independent semantic (map name to
module).

More detail: the extension recognizer could certainly retain cache about
each of the archives that are located. However, the recognizer would be
consulted (by the PathImporter) once for each archive found, in an
ordering defined by sys.path.

> All this is needed only to support import of identically named
> modules.  If there are none, there is no problem because sys.path
> is being used only to find modules, not to disambiguate them.

But the current (and future) semantics of Python states that you may have
identically named modules, and that sys.path *does* disambiguate them.

In fact, I use this feature all the time -- I use my new httplib.py rather
than the standard library version. I do this by placing the specific
directly "first" in my sys.path.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/