[Distutils] Freeze and new import architecture

Greg Stein gstein@lyra.org
Thu, 17 Dec 1998 18:15:37 -0800


I've messed around quite a bit with funky import mechanisms also and
have found the current setup a bit tough to work within. I'll insert
some comments below, but never fear... I've got more, too :-)

Mark Hammond wrote:
> ...
> sys.path not be restricted to path names.  sys.path has "strings", and
> an associated map of "module finders".  Thus, a sys.path entry could
> have a directory name (like now) or .zip file, URL, etc.

I would much prefer to see the module finder instances in the sys.path.
Sometimes, it is *very* difficult to map strings to module finders. For
example, if you have a .dll with frozen code in it, and the code has
been frozen in one of N formats, then how can you determine which module
finder to use for the format? IMO, it is better to insert the finder
itself:

sys.path.append(GZippedDLLResource("modulename", "mycode.dll"))
sys.path.append(DLLResourceGroup("mymodules.dll"))

> ...
Jack Jansen wrote:
> ...
>         WHAT WE HAVE
>         ============
> 
> We noticed that there are really two issues involved in importing
> modules:
> 1. Finding the module in a specific namespace.
> 2. Importing a module of a specific type, once it has been found.

I think the separation is bogus. Trying to fit into the Finder/Loader
paradigm of the ihooks has always been a total, non-intuitive
pain-in-the-ass for me (to be blunt :-). Instead, I just go straight for
the import hook and ignore the whole ihooks thing.

I would put forward that we ignore the find/load paradigm and simply go
to:

  1. Import the given module if you can

If an element on sys.path can't do it (returning None), then you move to
the next one.

> ...
> The builtin and frozen namespace are currently special, in that they
> don't
> occur in sys.path and are always virtually at the very front of
> sys.path. On
> the mac sys.path entries can be either filenames (in which case the
> latter two
> modulefinders are invoked) or directories (in which case the
> filesystem finder
> is invoked), on other platforms there are only directories in sys.path
> right
> now.

Simple to do:
sys.path.insert(0, BuiltinImporter())

The BuiltinImporter handles compiled-in and frozen modules.

> Regarding 2: the finder currently returns a structure that enables the
> correct
> importer to be called later on. Importers that we have are for
> builtin,
> frozen, .py/.pyc/.pyo modules, various dll-importers all hiding behind
> the
> same interface, PYC-resource importers (mac-only) and PYD-resource
> importers
> (mac-only).

Punt this. Just import the dumb thing in one shot.

Take the example of an HTTP-based import. Separating that into *two*
transactions would be painful. It should be imported in one fell swoop.
And no, you can't just keep the socket open and pass that to the loader
-- that implies that you can defer the passing for a while, but the web
server will time out your connection and close it. Conversely, if the
intent is *not* to hold the "structure" for a while, then why the heck
have two pieces?

> 
>         WHAT WE WANT
>         ============
> 
> What we'd like I'll try to describe top-down (hopefully better to
> understand
> than bottom-up).
> 
> importing a module becomes something like
> 
>    for pathentry in sys.path:
>         finder = getfinder(pathentry)
>         loader = finder.find(module)
>         module = loader()

I'll amend this to:

  for pathentry in sys.path:
    if type(pathentry) == StringType:
      module = old_import(pathentry, modname)
    else:
      module = pathentry.do_import(modname)
    if module:
      return module
  else:
    raise ImportError, modname + " not found."

> getfinder() is something like
>    if not path_to_finder.has_key(pathentry):
>         for f in all_finder_types:
>                 finder = f.create_finder(pathentry)
>                 if finder:
>                         path_to_finder[pathentry] = finder
>                         break
>     return path_to_finder[pathentry]

The above code is basically keeping a mirror of the sys.path list, but
with importer instances in it. Just put those into sys.path itself.

> And there would be a call whereby a finder type registers itself (adds
> itself
> to all_finder_types).

In my proposal, this wouldn't be necessary. You insert finders right
into sys.path.

> ...
> A loader can register itself with multiple finders, assuming their
> interfaces
> are similar. So, the .py loader could register itself not only with
> the
> filesystem finder but also with a url-based finder or something, as
> long as
> that url-based finder uses the same calling convention for creating
> the loader.

I'd rephrase this as you have multiple importer instances, each
configured for a different "path" to its module namespace.

>         WHAT DOES IT BUY US
>         ===================
> 
> A greatly simplified import.c, importdl.c split out over the various
> platforms
> (and with the possibility to pass machine-specific info from the find
> phase to
> the load phase, something that cant be done now and leads to double
> work on
> various platforms) and easy extensibility.

Agreed. Quite necessary. I would think that we could have different
little code bits for each platform, much like we have thread_*.h in the
Python/ subdirectory.

> There is the issue of performance. The description above is all
> Pythonish, but
> going through the Python calling sequence for all these things is
> probably not
> a good idea from a performance standpoint. This is however fixable.

I disagree.

In almost all cases, you are talking about bringing a module into the
interpreter. Your performance is going to be characterized by I/O,
memory allocations for all the structures that get built when the module
executes, and the actual execution time of that module.

The time spent using Python to perform the import sequence is a
non-issue.

> ...
> 
>         ODDS AND ENDS
>         =============
> 
> A side issue: this stuff would also allow us to put the builtin and
> frozen
> namespace into sys.path explicitly, for instance as "__builtins__" and
> "__frozen__", something I would like. The disadvantage would be that
> you can't
> be sure that everything on sys.path is a pathname, but the advantage
> would be
> that you could, for instance create a frozen program that you could
> patch:

Both of our proposals guarantee that stuff in sys.path are not
pathnames. If I insert a "foo.zip" or a
"http://host.domain.name/pymodules/", then you certainly dont have
pathnames.

I believe the biggest issue with my proposal is the fact that the values
are no longer strings. However, the out-of-the-box version of Python can
easily contain *just* strings. If people bring in custom importers into
their application, AND they have code that depends on the "string-ness"
of sys.path, then it is their problem. By definition, they've altered
the behavior of their app and they need to compensate; my proposal is
backwards compatible for existing apps.

[ the tweak would be to avoid inserting BuiltinImporter() -- the
instance/type could still exist, but merely be *implied* rather than
explicitly within sys.path ]

> set sys.path to ["/usr/local/FooPatches", "__frozen__",
> "__builtins__"], and
> whenever you have a patch to a single module in a frozen executable
> you just
> send your clients the single .pyc file and tell them to put it in the
> FooPatches directory. There'd probably have to be a bit of code that
> explicitly prepends "__frozen__" and "__builtins__" to sys.path if
> they aren't
> there already or something.

This is quite humorous... Small world: I've used this approach before.
In the Microsoft Merchant Server 1.0, we had a bunch of frozen Python
code. However, we also looked in a specific directly for patches. We
never patched it :-), but it was possible.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/