[Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

Tarek Ziadé ziade.tarek at gmail.com
Sun Aug 2 18:52:32 CEST 2009


On Wed, Jul 29, 2009 at 6:44 AM, P.J. Eby<pje at telecommunity.com> wrote:
> At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:
>>
>> On Tue, Jul 28, 2009 at 9:40 PM, P.J. Eby<pje at telecommunity.com> wrote:
>> > At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
>> >>
>> >> I can see how this could go quite wrong, but maybe if installers touch
>> >> some file in the library directory anytime a package is
>> >> installed/reinstalled/removed/etc,
>> >
>> > You mean, like, the mtime of the directory itself? Â ;-)
>>
>> Do directory mtimes get recursively updated?  I don't think they do.
>
> That's not necessary; if imports use a cached listdir, then the children
> will get handled recursively.
>
>> So if you have a layout:
>>
>> site-packages/
>>  zope/
>>    interface/
>>      __init__.py
>>
>> And you update the package and update __init__.py, the mtime of
>> site-packages doesn't change, does it?
>
> Nope, but at the top level, the fact that 'zope' is present is unchanged, as
> is the presence of an 'interface' subdirectory.
>
>
>> I'm saying if there was a file in site-packages/last_updated that gets
>> touched everytime an installer does anything in site-packages, then
>> you could cache (between processes) the lookups.
>
> Since each invocation of the interpreter can have a different PYTHONPATH,
> the cache has to be per-directory, not global.  If it's per-directory, then
> there's no real benefit over runtime caching, since you now have to open and
> read a file (instead of just reading the directory).  And as I said, it's
> not realistic to think that opening and reading a file is going to beat
> opening and reading a directory for speed.

But opening and reading one file should beat opening hundreds of directories :

For instance, a plone 3 application will have +100 sys.path entries because
this zc.buildout (the Plone standard) adds one entry per egg in sys.path.

So being able to cache'em should speed things up.

In the PEP 376 prototype, after thinking about a per-directory cache
like you are
describing, I was thinking about having a global index file to replace
the global dictionnary that keeps track of the distributions per
directory (currently the directory path
is  the key in the dictionnary and the value the distribution objects).

That can even be a simple shelve of the dictionary, that become a
global index of directories
that [are/were once] in the path. This works as long as the index file
is per-user.
Or even better : per-application. I don't know how this could be
managed/done, but
a simple cache file created alongside the script the application is
launched with, could
speed up the lookups at the second launch.

Cheers
Tarek


-- 
Tarek Ziadé | http://ziade.org


More information about the Distutils-SIG mailing list