[Python-Dev] Import redesign [LONG]

Fri, 19 Nov 1999 13:06:14 -0800 (PST)

[ taking the liberty to CC: this back to python-dev ]

On Fri, 19 Nov 1999, David Ascher wrote:
> > >   (2) a file in a directory that's on sys.path can be a zip/jar file;
> > >       its contents will be considered as a package (note that this is
> > >       different from (1)!)
> > 
> > No problem. This will slow things down, as a stat() for *.zip and/or *.jar
> > must be done, in addition to *.py, *.pyc, and *.pyo.
> 
> Aside: it strikes me that for Python programs which import lots of files,
> 'front-loading' the stat calls could make sense.  When you first look at a
> directory in sys.path, you read the entire directory in memory, and
> successive imports do a stat on the directory to see if it's changed, and
> if not use the in-memory data.  Or am I completely off my rocker here?

Not at all. I thought of this last night after my email. Since the
Importer can easily retain state, it can hold a cache of the directory
listings. If it doesn't find the file in its cached state, then it can
reload the information from disk. If it finds it in the cache, but not on
disk, then it can remove the item from its cache.

The problem occurs when you path is [A, B], the file is in B, and you add
something to A on-the-fly. The cache might direct the importer at B,
missing your file.

Of course, with the appropriate caveats/warnings, the system would work
quite well. It really only breaks during development (which is one reason 
why I didn't accept some caching changes to imputil from MAL; but that
was for the Importer in there; Python's new Importer could have a cache).

I'm also not quite sure what the cost of reading a directory is, compared
to issuing a bunch of stat() calls. Each directory read is an
opendir/readdir(s)/closedir. Note that the DBM approach is kind of
similar, but will amortize this cost over many processes.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/