[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?

Steve Dower steve.dower at python.org
Tue Sep 18 14:38:10 EDT 2018


On 18Sep2018 1057, Carl Shapiro wrote:
> On Tue, Sep 18, 2018 at 5:55 AM, Fabio Zadrozny <fabiofz at gmail.com 
> <mailto:fabiofz at gmail.com>> wrote:
> 
>     During the import process, Python can already deal with folders and
>     .zip files in sys.path... now, instead of having special handling
>     for a new concept with a custom command line, etc, why not just say
>     that this is a special file (e.g.: files with a .pyfrozen extension)
>     and make importlib be able to deal with it when it's on sys.path
>     (that way there could be multiple of those and there should be no
>     need to turn it on/off, custom command line, etc)?
> 
> 
> That is an interesting idea but it might not be easy to work into this 
> design.  The improvement in start-up time comes from eliminating the 
> overheads of filesystem I/O, memory allocation, and un-marshaling 
> bytecode.  Having this data on the filesystem would reintroduce the cost 
> of filesystem I/O and it would add a load-time relocation to the 
> equation so the overall performance benefits would be greatly lessened.
> 
>     Another question: doesn't importlib already provide hooks for
>     external contributors which could address that use case? (so, this
>     could initially be available as a third party library for maturing
>     outside of CPython and then when it's deemed to be mature it could
>     be integrated into CPython -- not that this can't happen on Python
>     3.8 timeframe, but it'd be useful checking its use against the
>     current Python version and measuring benefits with real world code).
> 
> 
> This may be possible but, for the same reasons I outline above, it would 
> certainly come at the expense of performance.
> 
> I think many people are interested in a better .pyc format but our goals 
> are much more modest.  We are actually trying to not introduce a whole 
> new way to externalize .py data in CPython.  Rather, we think of this as 
> just making the existing frozen module capability much faster so its use 
> can be broadened to making start-up performance better.  The user 
> visible part, the command line interface to bypass the frozen module, 
> would be a nice-to-have for developers but is something we could live 
> without.

The primary benefit of the importlib hook approach is that it would not 
require rebuilding CPython each time you make a change. Since we need to 
consider a wide range of users across a wide range of platforms, having 
the ability to load a single native module that contains many 
"pre-loaded" modules allows many more people to access the benefits.

It would not prevent some specific modules from being compiled into the 
main binary, but for those who do not build their own Python it would 
also allow specific applications to use the feature as well.

FWIW, I don't read this as being pushed back on Carl to implement before 
the idea is accepted. I think we're taking the (now proven) core idea 
and shaping it into a suitable form for the main CPython distribution, 
which has to take more use cases into account.

Cheers,
Steve


More information about the Python-Dev mailing list