pickle and module package

Michael P. Reilly arcege at shore.net
Wed May 19 12:16:52 EDT 1999


Fred L. Drake <fdrake at cnri.reston.va.us> wrote:

: M.-A. Lemburg writes:
:  > I'd say, there's no way for the import logic to tell whether
:  > you are about to import the same module a second time... unless
:  > maybe, if it scans the sys.modules dict for filenames of the modules
:  > and then checks for identical files. But that would reduce import
:  > performance dramatically and not be worth it.

:   A dictionary could be used that maps filenames to modules; this can
: simply be checked and updated during the slow path through import.
: This shouldn't be much slower than it already is.  ;-)
:   The filesnames would have to be absolute for it to work; there's
: currently nothing that does this for pathnames in the core
: interpreter.  I'd be quite happy if __file__ could be relied on to be
: absolute as well!

I pretty much agree, but not to get too pedentic, I'll bring up a
potential problem.

Attempting to figure out absolute pathnames can be difficult.  Here is
just one commonly known example from the UNIX world dealing with
Automounter V1 (I try not to know much about M$ platforms, but I can
think of some examples there too):

  Automounter (version 1) is a very useful tool that mounts NFS drives
  on demand without the need for root access.  As part of the
  implimentation, drives are mounted under one directory structure
  (usually /tmp_mnt) and accessed through another (the map) via
  symbolic links (but the mechanism only works through this map).  For
  example, a user's home directory is /home/luser which is mounted on
  /tmp_mnt/home/luser.  Performing a chdir("../..") will not bring the
  user to "/", but will bring him to "/tmp_mnt".

  On top of this, after a period of time, the system will attempt to
  unmount the drive.  If a processes accesses "/home/luser", the drive
  will be remounted, if a processes instead accesses
  "/tmp_mnt/home/luser" then the drive is not remounted and the system
  reports a failure.

The reason I bring this up is because the os.getcwd() call (on UNIX)
performs the basic 'traverse ".." to figure out where I am' algorithm.

This could lead to problems:
   sys.path          contains "/home/luser"
   sys.module_cache  contains "/tmp_mnt/home/luser"

I'm not saying that this is not a good idea, just that it must be handled
carefully.  Different systems will have different problems with this
("aliases" on Macs, mounted drives on Windows, odd remote filesystem
drivers on UNIX).

Overall, I don't see that a module filename cache would be a
performance hit.  I think that it would be better to use the cache at
the C level from within the import.c module (for thread locking
purposes at the very least), with access in the imp module?

  -Arcege





More information about the Python-list mailing list