[Import-sig] Long-awaited imputil comments

Guido van Rossum guido@python.org
Sun, 13 Feb 2000 12:05:52 -0500


Hi Greg,

I've finally got to a full close-reading of imputil.py. There sure is
a lot of good stuff there.  At the same time, I think it's far from
ready for prime time in the distribution.  Here are some detailed
comments.  (I'd also like to have a high level discussion about its
fate, but I'll let to respond to this list first.)


class ImportManager:

    General comment:

        I would like to use/subclass the ImportManager in rexec (in
        order to emulate its behavior), but for that to work, I would
        need to change all references to sys.path and sys.modules (and
        sys.whatever) to self.whatever, but that would currently
        require rewriting large pieces of code. It would be nice if
        the sys module were somehow passed in.  I have a feeling the
        same is true for all references to the os module (_os_stat
        etc.) because the rexec module also wants to have control over
        what pieces of the filesystem you have access to.  This
        explains some of the complexity of ihooks (which is currently
	used by rexec).

    def install():

        The __chain_* code seems in transition (e.g. some functions
        end with both raise and return)

        The hook mechanism for 1.6 hasn't been designed yet; what
        should it be?

    def add_suffix():

        It seems the suffixes variable is only used by the
        _FilesystemImporter. Since it is shared, calls to add_suffix()
        will have an effect on the _FilesystemImporter instance. I
        think it would make more sense if the suffixes table was
        initialized and managed by the _FilesystemImporter; the
        add_suffix method on the ImportManager could then simply pass
        its arguments on to the _FilesystemImporter.

    def _import_hook():

        I think we need a hook here so that Marc-Andre can implement
        walk-me-up-Scotty; or Tim Peters could implement a policy that
        requires all imports to use the full module name, even from
        within the same package.

        top_module = sys.modules[parts[0]]:
            There's an undocumented convention that sys.modules[x] may
            be None, to indicate that module x doesn't exist and that
            we shouldn't try to look for it any further. This is used
            by package import to avoid excessive filesystem access in
            the case where modules in a package import top-level
            modules. E.g. we're in package P, module M.  Now we see
            "import string". The "local imports override global
            imports" rule requres that we first look for P.string,
            which of course doesn't exist, before we look for string
            in sys.path.  The first time we look for P.string, we
            actually do a bunch of stats: for P/string.py,
            P/string.pyc, P/string.pyd, P/string.dll.  When other
            submodules of package P also import string, they would
            each incur all these stat() calls, unless we somehow
            remebered that there's no P.string. This is taken care of
            by setting sys.modules['P.string'] = None.

            Anyway, I think that your logic here doesn't expect that
            to happen.  A fix could be to put "if not top_module:
            raise KeyError" inside the try/except.

    def _determine_import_context():

        Possible feature: the package could set a flag here to force
        all imports to go to the top-level (i.e., the flag would mean
        "no relative imports").

    def _import_top_module():

        Instead of type(item)..., use isinstance().

        Looking forward to _FilesystemImporter: I want to be able to
        have the *name* of a zip file (or other archive, whatever is
        supported) in sys.path; that seems more convenient for the
	user and simplifies $PYTHONPATH processing.

    def _reload_hook():

        Note that reload() does NOT blast the module's dict; for
	better or for worse.  (Occasionally modules know this and save
	important global data.)

class Importer:

    def install():

        This should be a method on the manager.  (That makes it easier
	to change references to sys.path etc.; see my rexec notes above.)

    def import_top():

        This appears a hook that a base class can override; but why?

    "PRIVATE METHODS":

        These aren't really private to the class; some are used from
	the ImportManager class.

        I note that none of these use any state of the class (it
        doesn''t *have* any state) and they are essentially an
        implementation of the (cumbersome) package import policy.
        I wonder if the whole package import policy shouldn't be
        implemented in the ImportManager class -- or even in a
	separate policy class.

    def get_code():

        On the 2/3-tuple return value: a proposal for code to be
        included in 1.6 shouldn't be encumbered by backwards
        compatibility issues to previous versions of the proposal.

        I'm still worried about get_code() returning the code object
        (or even a module, in the case of extensions).  This means
        that something like freeze might have to re-implement the
        whole module searching -- first, it might want the source code
        instead of the code object, and second, it might not want the
        extension to be loaded at all!

        I've noticed that all implementations of get_code() start with
        a test whether parent is None or not, and branch to completely
        different code.  This suggests that we might have two separate
        methods???

"Some handy stuff for the Importers":

    This seems to be a remnant of an older imputil.py version; it
    appears to be unused by the current code or at least there is some
    code duplication; e.g. _c_suffixes is also calculated in
    ImportManager.

    def _compile():

        You will have to strip CRLF from the code string read from the
        file; this is for Unix where opening the file in text mode
        doesn't do that, but users have come to expect this because
        the current implementation explicitly strips them.

        I've recently been notified of a race condition in the code
        here, when two processes are writing a .pyc file and a third
        is reading it.  On Unix we will have to remove the .pyc file
        and then use os.open() with O_EXCL in order to avoid the race
        condition.  I don't know how to avoid it on Windows.

    def _os_bootstrap():

        This is ugly and contains a dangerous repetition of code from
        os.py.  I know why you want it but for the "real" version we
        need a different approach here.

    def _fs_import():

        I think this is unused now?  Anyway, it hardcodes the suffix
        sequence.

        The test t_pyc >= t_py does not match the current
        implementation and should not affect the outcome.  (But it
        does save a stat() call in a common case...)

        The call to _compile() may raise SyntaxError (and also
        OverflowError and maybe a few others). I don't know what to do
        about this, but the traceback in that case will look really
        ugly!

class _FilesystemImporter:

    See comments above about suffix list management.

class SuffixImporter:

    Why is this a class at all?  It would seem to be sufficient to
    have a table of functions instead of a table of instances.  These
    importers have no state and only one method that is always
    overridden.


--Guido van Rossum (home page: http://www.python.org/~guido/)