[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement?

Fabio Zadrozny fabiofz at gmail.com
Tue Sep 18 08:55:15 EDT 2018


On Mon, Sep 17, 2018 at 9:23 PM, Carl Shapiro <carl.shapiro at gmail.com>
wrote:

> On Sun, Sep 16, 2018 at 1:24 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
>
>> I think it's of limited interest if it only helps with modules used
>> during the startup sequence, not arbitrary stdlib or third-party
>> modules.
>>
>
> This should help any use-case that is already using the freeze module
> already bundled with CPython.  Third-party code, like py2exe, py2app,
> pyinstaller, and XAR could build upon this to create applications that
> start faster.
>

I think this seems like a great idea.

Some questions though:

During the import process, Python can already deal with folders and .zip
files in sys.path... now, instead of having special handling for a new
concept with a custom command line, etc, why not just say that this is a
special file (e.g.: files with a .pyfrozen extension) and make importlib be
able to deal with it when it's on sys.path (that way there could be
multiple of those and there should be no need to turn it on/off, custom
command line, etc)?

Another question: doesn't importlib already provide hooks for external
contributors which could address that use case? (so, this could initially
be available as a third party library for maturing outside of CPython and
then when it's deemed to be mature it could be integrated into CPython --
not that this can't happen on Python 3.8 timeframe, but it'd be useful
checking its use against the current Python version and measuring benefits
with real world code).

To give an idea, on my machine the baseline Python startup is about 20ms
>> (`time python -c pass`), but if I import Numpy it grows to 100ms, and
>> with Pandas it's more than 200ms.  Saving 4ms on the baseline startup
>> would make no practical difference for concrete usage.
>>
>
> Do you have a feeling for how many of those milliseconds are spend loading
> bytecode from disk?  If so standalone executables that contain numpy and
> pandas (and mercurial) would start faster
>
>
>> I'm ready to think there are other use cases where it matters, though.
>>
>
> I think so.  I hope you will, too :-)
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> fabiofz%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180918/a922b560/attachment.html>


More information about the Python-Dev mailing list