[Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

P.J. Eby pje at telecommunity.com
Sun Aug 2 19:54:18 CEST 2009


At 06:52 PM 8/2/2009 +0200, Tarek Ziadé wrote:
>On Wed, Jul 29, 2009 at 6:44 AM, P.J. Eby<pje at telecommunity.com> wrote:
> > At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:
> >>
> >> On Tue, Jul 28, 2009 at 9:40 PM, P.J. Eby<pje at telecommunity.com> wrote:
> >> > At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
> >> >>
> >> >> I can see how this could go quite wrong, but maybe if installers touch
> >> >> some file in the library directory anytime a package is
> >> >> installed/reinstalled/removed/etc,
> >> >
> >> > You mean, like, the mtime of the directory itself? Â ;-)
> >>
> >> Do directory mtimes get recursively updated?  I don't think they do.
> >
> > That's not necessary; if imports use a cached listdir, then the children
> > will get handled recursively.
> >
> >> So if you have a layout:
> >>
> >> site-packages/
> >>  zope/
> >>    interface/
> >>      __init__.py
> >>
> >> And you update the package and update __init__.py, the mtime of
> >> site-packages doesn't change, does it?
> >
> > Nope, but at the top level, the fact that 'zope' is present is 
> unchanged, as
> > is the presence of an 'interface' subdirectory.
> >
> >
> >> I'm saying if there was a file in site-packages/last_updated that gets
> >> touched everytime an installer does anything in site-packages, then
> >> you could cache (between processes) the lookups.
> >
> > Since each invocation of the interpreter can have a different PYTHONPATH,
> > the cache has to be per-directory, not global.  If it's per-directory, then
> > there's no real benefit over runtime caching, since you now have 
> to open and
> > read a file (instead of just reading the directory).  And as I said, it's
> > not realistic to think that opening and reading a file is going to beat
> > opening and reading a directory for speed.
>
>But opening and reading one file should beat opening hundreds of directories :
>In the PEP 376 prototype, after thinking about a per-directory cache
>like you are
>describing, I was thinking about having a global index file to replace
>the global dictionnary that keeps track of the distributions per
>directory (currently the directory path
>is  the key in the dictionnary and the value the distribution objects).
>
>That can even be a simple shelve of the dictionary, that become a
>global index of directories
>that [are/were once] in the path. This works as long as the index file
>is per-user.
>Or even better : per-application. I don't know how this could be
>managed/done, but
>a simple cache file created alongside the script the application is
>launched with, could
>speed up the lookups at the second launch.

You'd still have to stat the directories to know if they changed - in 
which case the logic I've already laid out still applies.

I think, however, we are discussing different nominal scenarios.  I'm 
assuming a post-PEP 376 world where the only use for .egg files or 
directories are for *non-default* versions of packages, that only get 
added to sys.path for apps or libraries that need them, rather than 
being in a default .pth file.

However, if you're discussing speeding up an environment where we use 
.egg directories and they're on sys.path, then a per-user global 
cache might speed things up.  For security reasons, however, that 
cache would need to be ignored by Python when running secure 
scripts.  (e.g. -s and -E options, and definitely anything setuid.)

In contrast, directory stat caching with a modest number of (non-egg) 
PYTHONPATH entries would speed things nicely in the 
hopefully-future-default case.



More information about the Distutils-SIG mailing list