python24.zip

Dieter Maurer dieter at handshake.de
Tue May 24 16:43:20 EDT 2005


"Martin v. Löwis" <martin at v.loewis.de> writes on Sun, 22 May 2005 21:24:41 +0200:
> ...
> What do you mean, "unable to"? It just doesn't.

The original question was: "why does Python put non-existing
entries on 'sys.path'".

Your answer seems to be: "it just does not do it -- but it might
be changed if someone does the work".

This fine with me.

> ...
> In the past, there was a silent guarantee that you could add
> items to sys.path, and only later create the directories behind
> these items. I don't know whether people rely on this guarantee.

I do not argue that Python should prevent adding non-existing
items on "path". This would not work as Python may not
know what "existing" means (due to "path_hooks").

I only argue that it should not *itself* (automatically) put items on path
where it knows the responsible importers and knows (or can
easily determine) that they are non existing for them.

> ...
> > The application was Zope importing about 2.500 modules
> > from 2 zip files "zope.zip" and "python24.zip".
> > This resulted in about 12.500 opens -- about 4 times more
> > than would be expected -- about 10.000 of them failing opens.
> 
> I see. Out of curiosity: how much startup time was saved
> when sys.path was explicitly stripped to only contain these
> two zip files?

I cannot tell you precisely because it is very time consuming
to analyse cold start timing behavior (it requires a reboot for
each measurement).

We essentially have the following numbers only:

                   warm start            cold start
                (filled OS caches)    (empty OS caches)

from file system        5s                 13s
from ZIP archives       4s                  8s
frozen                  3s                  5s

The ZIP archive time was measured after a patch to "import.c"
that prevents Python to view a ZIP archive member as a directory
when it cannot find the currently looked for module (of course,
this lookup fails also when the archive member is viewed as a directory).
Furthermore, all C-extensions were loaded via a "meta_path" hook (and
not "sys.path") and "sys.path" contained just the two Zip archives.
These optimizations led to about 3.000 opens (down from originally 12.500).

> I would expect that importing 2500 modules takes *way*
> more time than doing 10.000 failed opens.

You may be wrong: searching for non existing files may cause
disk io which is several orders of magnitude slower that
CPU activities.

The comparison between warm start (few disc io) and cold start
(much disc io) tells you that the import process is highly
io dominated (for cold starts).

I know that this does not prove that the failing opens contribute
significantly. However, a colleague reported that the
"import.c" patch (essential for the reduction of the number of opens)
resulted in significant (but not specified) improvements.


Dieter



More information about the Python-list mailing list