[Import-sig] Re: Long-awaited imputil comments

Wed, 16 Feb 2000 14:49:14 -0800 (PST)

On Wed, 16 Feb 2000, Guido van Rossum wrote:
>...
> > Understood. I think some of your concerns are based on its historical
> > cruftiness, rather than where it "should" be. I'll prep a new version with
> > the backwards compat pulled out. People that relied on the old version can
> > continue to use their older version (which is Public Domain, so they're
> > quite free to do so :-)
> 
> OK, I'm awaiting a new imputil announcement.  (You should really have
> a webpage for it pointing both to the old and the new version.)

I do. http://www.lyra.org/greg/python/. The stable module is provided, and
the latest is available via ViewCVS (linked from the imputil section on
the page).

>...
> Agreed.  It sounds like we're stuck with overriding
> __builtin__.__import__, like before.

Yes. I'll rebuild around this. If somebody comes up with a new/better
design at some point, then we switch.

> > BUT: we may be able to design an RExecImportManager that remembers
> > restricted modules and uses different behavior when it determines the
> > import context is one of those modules. I haven't thought on this to
> > determine is true viability, tho...
> 
> Sounds tricky -- *all* modules imported in restricted mode must be
> treated differently.

Yah. We'll just use the __import__ thing. It will allow multiple
ImportManager objects to exist.

>...
> If we let __domain__ be initialized by the importer but overridden by
> the package, we can do everything we need.

All right. I'll look at adding that.

>...
> > >         Looking forward to _FilesystemImporter: I want to be able to
> > >         have the *name* of a zip file (or other archive, whatever is
> > >         supported) in sys.path; that seems more convenient for the
> > > 	user and simplifies $PYTHONPATH processing.
> > 
> > Hmm. This will complicate things quite a bit, but is doable. It will also
> > increase the processing time for sys.path elements.
> 
> We could cache this in a dictionary: the ImportManager can have a

Sounds like a good plan. I'll add this.

>...
> > >     def import_top():
> > > 
> > >         This appears a hook that a base class can override; but why?
> > 
> > It is for use by clients of the Importer. In particular, by the
> > ImportManager. It is not intended to be overridden.
> 
> Are there any other clients of the Importer class?  Since import_top()
> simply calls _import_one(), do we even need it?

Manual invocation. This code works:

-------------------------
fetch = my_imp_tools.HTTPImporter("...")
qp_xml = fetch.import_top("qp_xml")
-------------------------

In other words: it is entirely possible to use an Importer without it
being installed into an ImportManager.

>...
> Then use a different word -- "private" has a well-defined meaning for
> C++ and Java programmers.  To me, "internal" sounds better.

I'll comment/doc appropriately.

>...
> But the importer is the wrong place to change the policy globally.
> The example is walk-me-up-Scotty: to implement that (or, more
> generally, to implement the __domain__ hook) without editing
> imputil.py, Marc-Andre would have to have to subclass all the
> importers that are used.  If the policy was embodied in the
> ImportManager class, he could subclass the ImportManager, install it
> instead of the default one, but continue to use the existing importers
> (e.g. to import from zip files).

You're falling right back into the classic import hook problem. If MAL
alters ImportManager and installs it, then he blows away whatever TP has
done.

The only time that anybody should ever consider modifying or subclassing
ImportManager:

1) An rexec-like environment. This is allowed because you are installing
   the new ImportManager into a specific namespace, rather than
   __builtin__.__import__

2) A shipping application. This is allowed because it's your app :-). The
   app is fully self-contained and all imports are known ahead of time.

   [ if your app is going to import unknown third-party code, then we're
     still okay, as that third-party stuff will be in an rexec
     environment, or it won't fall into the "I'm an app" category ]

Given that modules/packages should not be altering ImportManager at any
point, then any policy changes must go elsewhere. I claim that is the
Importer that is managing that package. Since a package is normally
imported and managed only *one* Importer, then the problem falls down to
altering the Importer used for the package while it is loading.

This is doable:

1) import foo.bar.baz is executed

2) ImportManager locates the package via an Importer. That Importer
   calls get_code() (or import_from_dir() for the _FilesystemImporter)

3) The Importer loads the package code object from wherever.
   (_FilesystemImporter loads the code object for __init__ and returns it)

4) The Importer creates a module object and stores __importer__ into it,
   pointing at <self>.

5) The code object is executed, overwriting __importer__.

   (it is also possible to do this by returning a new value in result[2]
    of the get_code() call, but this scenario doesn't have a custom
    Importer installed yet that would do this)

6) The Importer finishes loading the package and returns to the
   ImportManager.

7) The ImportManager calls _finish_import on the Importer found in the
   package module's __importer__ attribute (the custom Importer)

8) The custom Importer Does Its Thing

In essence, people should be highly discouraged from ever touching
ImportManager.

The particular __domain__ thing can be defined as a package-private
modification to the import process (for that package only!), thus the
package should fix up the Importer used for itself. Specifically, the MAL
and TP "import style" is implemented by overriding the algorithm in
Importer._do_import().

>...
> I know you can't do it fully automatically.  But I still want to be
> able to reuse as much of existing importer and importmanager classes
> as possible.  Currently, Tools/freeze/modulefinder.py contains a lot
> of code that reimplements the entire package resolution mechanism.
> It should really be able to reuse all that -- so that e.g. the
> addition of a __domain__ feature doesn't require changes both to
> imputil.py and to modulefinder.py.

All right. I'll see if I can come up with something for this.

> > That said: I wouldn't be opposed to adding a get_source() method to an
> > Importer. If the source is available, then it can return it (it may not be
> > available in an archive!).
> 
> How would you invoke it?  I would need to have a different call into
> the ImportManager that would invoke get_source() rather than
> get_code().

Yes: a different call into the ImportManager.

I'll do the get_source thing as a second step (possibly as a subclass, as
you mentioned). First is to fold in the rest of the feedback.

>...
> > >         I've noticed that all implementations of get_code() start with
> > >         a test whether parent is None or not, and branch to completely
> > >         different code.  This suggests that we might have two separate
> > >         methods???
> > 
> > Interesting point. I hadn't noticed that. They don't always branch to
> > different points, however: consider DirectoryImporter and FuncImporter.
> 
> But those are the most trivial examples.

So? That doesn't negate their use as an example.

class HTTPImporter:
  def __init__(self, url):
    self.url = url
  def get_code(self, parent, modname, fqname):
    if parent:
      url = parent.__url__
    else:
      url = self.url
    # look for <modname> at <url>

Granted, we could also rewrite the "look for" part as a method which is
called by get_subcode() and get_topcode().

> > Essentially, we have two forms of calls:
> > 
> >     get_code(None, modname, modname)     # look for a top-level module
> >     get_code(parent, modname, fqname)    # look in a package for a module
> > 
> > imputil has been quite nice because you only had to worry about one hook,
> > but separating these might be a good thing to do. My preference is to
> > leave it as one hook, but let's see what others have to say...
> 
> You might provide a default implementation for get_code() that calls
> either get_subcode() or get_topcode() depending on whether parent is
> None; then subclasses can separately override those, or override
> get_code() when it's more convenient.

Sure.

>...
> > I'll also move the Importer subclasses to a new file for placement under
> > Demo/. That should trim imputil.py down quite a lot.
> 
> Except for the _FilesystemImporter class, I presume, which is needed
> in normal use.

Yes.

>...
> I mean that you have to do
> 
>   codestring = re.sub(r"\r\n", r"\n", codestring)
> 
> on the code string after reading it.  This has nothing to do with the

Ah! Okay... not a problem.

It would be nice to invoke the compiler on an open file object. That would
obviate this problem entirely.

I think that I'll look into doing a patch for this, rather than using
re.sub().

>...
> Unix tools shouldn't insist on it being absent.  Fixing compile() is
> hard, unfortunately, hence this request for a workaround.

If I can't figure out a way to do it, then I'll fall back to re.sub() :-)

>...
> > It would be nice if there was a "shared"
> > mode for reading. How are two writers and a reader different from a single
> > writer and a single reader?
> 
> OK, I'll explain the problem.  (Cut from a mail explaining it to
> Jeremy:)
>...
> | I have devised the following solution (which may even work on
> | Windows) but not yet implemented it:
> | 
> | when writing the .pyc file, use unlink() to remove the .pyc file and
> | then use low-level open() with the proper flags to require that the
> | file doesn't yet exist (O_EXC:?); then use fdopen().  If the open()
> | fails, don't write (treat it the same as a failing fopen() now.)
> | 
> | (You'd think that you could use a temporary file, but it's hard to
> | come up with a temp filename that's unique -- and if it's not unique,
> | the same race condition could still happen.)

Consider it fixed.

>...
> OK.  Let's table this one until we feel we know how to refactor os.py.
> (Maybe a platform-specific os.py could be frozen into the
> interpreter.)

I'll leave appropriate comments in the source as a reminder.

> > > class SuffixImporter:
> > > 
> > >     Why is this a class at all?  It would seem to be sufficient to
> > >     have a table of functions instead of a table of instances.  These
> > >     importers have no state and only one method that is always
> > >     overridden.
>...
> > I used instances because I wasn't sure what all we might want in there. If
> > we don't add any other methods or attributes to the public interface, then
> > yah: we could switch to a function-based approach.
> 
> See http://c2.com/cgi-bin/wiki?YouArentGonnaNeedIt (and the
> rest of this wikiweb on refactoring, patterns etc.) for why you
> shouldn't plan ahead this far.

I wasn't planning far ahead at all. Just banging out some code :-) Now
that that piece is (ahem) done, I'll revise the use of ".import_file".

And yes... YouArentGonnaNeedIt is a very familiar mantra to me. You should
have been at MSFT with me to see how many times I wielded that bat against
the developers :-)
[ the mantra applies whole-heartedly to Python; it gets a little less
  rigid when you're talking about hard-to-maintain languages like C++ ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/