[Distutils] Freeze and new import architecture

Mark Hammond MHammond@skippinet.com.au
Fri, 18 Dec 1998 11:45:53 +1100


[Apols in advance for the length of this - but the background is
probably necessary]

I will make no attempt to outline what the freeze tool should
ultimately be!  Im not even going to outline what freeze is now! Im
just going to focus on one single requirement.

We want freeze to be capable of working with any number of "code
containers" - ie, to be able to locate code in "frozen C modules" (as
it works now), or potentially in a .zip file etc.

The justification is fairly easy to explain:  Consider that you may
want to use "freeze" to distribute an application that consists of a
number of Python programs.  Using freeze as it stands now, the entire
Python library is frozen into each application.  Thus the result is
each program is frozen to an executable that may be megabytes in size,
even though the bulk is likely to be the identical, standard Python
library.

Attempting to cut a long story short, Guido, Jack Jansen, myself and
Just came up with another idea for an extensible import mechanism:

sys.path not be restricted to path names.  sys.path has "strings", and
an associated map of "module finders".  Thus, a sys.path entry could
have a directory name (like now) or .zip file, URL, etc.

I have included below a mail from Jack Jansen on this topic.  To
paraphrase Guido's response, it was: "looks good - but lets define a
Python interface so it may work with JPython"

Accordingly, I have mocked up a few .py files which start to implement
the ideas in Jack's post.  However, Im now floundering a little as to
the best way to take this any further.  Im really looking for further
support or critisism of the idea, and some interest from people in
helping take this further...  If there is general interest, I will
post my mock-up, which allows directories and URLs to exist on
sys.path...

Thanks,

Mark.


-----Original Message-----
From: Jack Jansen [mailto:Jack.Jansen@cwi.nl]
Sent: Wednesday, 12 August 1998 1:11 AM
To: guido@python.org; just@letterror.com; mhammond@skippinet.com.au
Subject: An import architecture


I thought I'd just mail you the results of a discussion Mark and
myself had
about the import architecture. I'm just sending it out to us four, but
I guess
we should get more people into the discussion if Guido thinks that
this is
something worth "solving".

The problem we're looking at is that import.c and importdl.c have lots
of
special case code, and everytime someone comes up with a new neat way
to
import modules more special case code has to be added. This happened
with Mac
PYC resource modules and PYD modules, and there's also importdl.c
which nows
about DLL modules for umpteen different systems. It would be nice if
there was
an architecture in import.c that would allow the machine specific code
to hook
into (and, ultimately, Python code as well).

	WHAT WE HAVE
	============

We noticed that there are really two issues involved in importing
modules:
1. Finding the module in a specific namespace.
2. Importing a module of a specific type, once it has been found.

For 1. we currently have 5 namespaces: builtin modules, frozen
modules, the
filesystem namespace, the PYC resource namespace in a file (mac-only)
and the
PYD resource namespace in a file (again mac-only). Other namespaces
can be
envisioned, for instance the namespace inside a squeeze .pyz archive,
a
web-based namespace, whatever.

The builtin and frozen namespace are currently special, in that they
don't
occur in sys.path and are always virtually at the very front of
sys.path. On
the mac sys.path entries can be either filenames (in which case the
latter two
modulefinders are invoked) or directories (in which case the
filesystem finder
is invoked), on other platforms there are only directories in sys.path
right
now.

Regarding 2: the finder currently returns a structure that enables the
correct
importer to be called later on. Importers that we have are for
builtin,
frozen, .py/.pyc/.pyo modules, various dll-importers all hiding behind
the
same interface, PYC-resource importers (mac-only) and PYD-resource
importers
(mac-only).

	WHAT WE WANT
	============

What we'd like I'll try to describe top-down (hopefully better to
understand
than bottom-up).

importing a module becomes something like

   for pathentry in sys.path:
	finder = getfinder(pathentry)
	loader = finder.find(module)
	module = loader()
getfinder() is something like
   if not path_to_finder.has_key(pathentry):
	for f in all_finder_types:
		finder = f.create_finder(pathentry)
		if finder:
			path_to_finder[pathentry] = finder
			break
    return path_to_finder[pathentry]

And there would be a call whereby a finder type registers itself (adds
itself
to all_finder_types).

finder.create_finder() would examine the current path component, see
if it
could handle it, and, if so, return a finder object that will search
this path
component everytime it is invoked. The usual case is that getfinder
doesn't do
much: only the first time you see a new entry in sys.path you'll have
to ask
all the finders in turn whether they support it, ad you just remember
this.

The loaders register themselves with the finders, passing
finder-specific
arguments. For instance, the .py loader registers itself with the
filesystem
finder, telling it that the ".py" extension should result in a .py
loader
being created. The unix-specific dll-loader does the same for .so
extensions,
the windows-dll-loader for .dll, the mac-dll-loader for .slb etc.
The PYC-resource loader tells the mac-resource-finder to look for 'PYC
'
resources, the PYD-resource loader tells it to look for 'PYD '
resources, etc.

The information that is passed from the finder to loader when it has
found a
module is again finder-specific: the filesystem loader will probably
pass an
open file and a filename, etc.

A loader can register itself with multiple finders, assuming their
interfaces
are similar. So, the .py loader could register itself not only with
the
filesystem finder but also with a url-based finder or something, as
long as
that url-based finder uses the same calling convention for creating
the loader.

	WHAT DOES IT BUY US
	===================

A greatly simplified import.c, importdl.c split out over the various
platforms
(and with the possibility to pass machine-specific info from the find
phase to
the load phase, something that cant be done now and leads to double
work on
various platforms) and easy extensibility.

There is the issue of performance. The description above is all
Pythonish, but
going through the Python calling sequence for all these things is
probably not
a good idea from a performance standpoint. This is however fixable.
The
objects involved are
- The finder "class", the thing you use to check whether a certain
sys.path
  component can be handled by this code
- The finder instance returned by this class
- The importer class (called by the finder objects)
- The importer instance (returned by the finder instance through
invoking the
  importer class).

These 4 things could well be 4 specific PyObject types, with the
needed
C-routines in the object struct. Calling these for the normal case
(i.e. when
they're implemented in C) would be at most a single (C-) indirection
more
expensive than the current scheme.

Moreover, it would be easy to create a module that would implement
these 4
types as wrappers around Python code. The C-routines would then do the
usual
PyEval_CallObject stuff to call the Python implementation. So, you'd
have the
generality but you'd only pay for it when you put Python-handled
entries in
sys.path and actually hit those entries.

	ODDS AND ENDS
	=============

A side issue: this stuff would also allow us to put the builtin and
frozen
namespace into sys.path explicitly, for instance as "__builtins__" and
"__frozen__", something I would like. The disadvantage would be that
you can't
be sure that everything on sys.path is a pathname, but the advantage
would be
that you could, for instance create a frozen program that you could
patch:
set sys.path to ["/usr/local/FooPatches", "__frozen__",
"__builtins__"], and
whenever you have a patch to a single module in a frozen executable
you just
send your clients the single .pyc file and tell them to put it in the
FooPatches directory. There'd probably have to be a bit of code that
explicitly prepends "__frozen__" and "__builtins__" to sys.path if
they aren't
there already or something.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal
++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your
sig ++++
http://www.cwi.nl/~jack | see
http://www.xs4all.nl/~tank/spg-l/sigaction.htm