[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?

Tue Sep 18 14:46:11 EDT 2018

On Tue, Sep 18, 2018 at 2:57 PM, Carl Shapiro <carl.shapiro at gmail.com>
wrote:

> On Tue, Sep 18, 2018 at 5:55 AM, Fabio Zadrozny <fabiofz at gmail.com> wrote:
>
>> During the import process, Python can already deal with folders and .zip
>> files in sys.path... now, instead of having special handling for a new
>> concept with a custom command line, etc, why not just say that this is a
>> special file (e.g.: files with a .pyfrozen extension) and make importlib be
>> able to deal with it when it's on sys.path (that way there could be
>> multiple of those and there should be no need to turn it on/off, custom
>> command line, etc)?
>>
>
> That is an interesting idea but it might not be easy to work into this
> design.  The improvement in start-up time comes from eliminating the
> overheads of filesystem I/O, memory allocation, and un-marshaling
> bytecode.  Having this data on the filesystem would reintroduce the cost of
> filesystem I/O and it would add a load-time relocation to the equation so
> the overall performance benefits would be greatly lessened.
>
>
>> Another question: doesn't importlib already provide hooks for external
>> contributors which could address that use case? (so, this could initially
>> be available as a third party library for maturing outside of CPython and
>> then when it's deemed to be mature it could be integrated into CPython --
>> not that this can't happen on Python 3.8 timeframe, but it'd be useful
>> checking its use against the current Python version and measuring benefits
>> with real world code).
>>
>
> This may be possible but, for the same reasons I outline above, it would
> certainly come at the expense of performance.
>
> I think many people are interested in a better .pyc format but our goals
> are much more modest.  We are actually trying to not introduce a whole new
> way to externalize .py data in CPython.  Rather, we think of this as just
> making the existing frozen module capability much faster so its use can be
> broadened to making start-up performance better.  The user visible part,
> the command line interface to bypass the frozen module, would be a
> nice-to-have for developers but is something we could live without.
>

Just to make sure we're in the same page, the approach I'm talking about
would still be having a dll, not a better .pyc format, so, during the
import a custom importer would open that dll once and provide modules from
it -- do you think this would be much more overhead than what's proposed
now?

I guess it may be a bit slower because it'd have to obey the existing
import capabilities, but that shouldn't mean more time is spent on IO,
memory allocation nor un-marshaling bytecode (although it may be that I
misunderstood the approach or the current import capabilities don't provide
the proper api for that).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180918/09a83c08/attachment.html>