[Python-Dev] Python startup time

Nick Coghlan ncoghlan at gmail.com
Sun Jul 23 23:59:30 EDT 2017


On 23 July 2017 at 09:35, Steve Dower <steve.dower at python.org> wrote:
> Yes, I’m aware of that, which is why I don’t have any specific suggestions
> off-hand. But given the differences in file systems between Windows and
> other OSs, it wouldn’t surprise me if there were a more optimal approach for
> NTFS to amortize calls better. Perhaps not, but it is still the most
> expensive part of startup that we have any ability to change, so it’s worth
> investigating.

That does remind me of a capability we haven''t played with a lot recently:

$ python3 -m site
sys.path = [
    '/home/ncoghlan',
    '/usr/lib64/python36.zip',
    '/usr/lib64/python3.6',
    '/usr/lib64/python3.6/lib-dynload',
    '/home/ncoghlan/.local/lib/python3.6/site-packages',
    '/usr/lib64/python3.6/site-packages',
    '/usr/lib/python3.6/site-packages',
]
USER_BASE: '/home/ncoghlan/.local' (exists)
USER_SITE: '/home/ncoghlan/.local/lib/python3.6/site-packages' (exists)
ENABLE_USER_SITE: True

The interpreter puts a zip file ahead of the regular unpacked standard
library on sys.path because at one point in time that was a useful
optimisation technique for reducing import costs on application
startup. It was a potentially big win with the old "multiple stat
calls" import implementation, but I'm not aware of any more recent
benchmarks relative to the current listdir-caching based import
implementation.

So I think some interesting experiments to try measuring might be:

- pushing the "always imported" modules into a dedicated zip archive
- having the interpreter pre-seed sys.modules with the contents of
that dedicated archive
- freezing those modules and building them into the interpreter that way
- compiling the standalone top-level modules with Cython, and loading
them as extension modules
- compiling in the Cython generated modules as builtins (not currently
an option for packages & submodules due to [1])

The nice thing about those kinds of approaches is that they're all
fairly general purpose, and relate primarily to how the Python
interpreter is put together, rather than how the individual modules
are written in the first place.

(I'm not volunteering to run those experiments, though - just pointing
out some of the technical options we have available to us that don't
involve adding more handcrafted C extension modules to CPython)

[1] https://bugs.python.org/issue1644818

Cheers,
NIck.

P.S. Checking the current list of source modules implicitly loaded at
startup, I get:

>>> import sys
>>> sorted(k for k, m in sys.modules.items() if m.__spec__ is not None and type(m.__spec__.loader).__name__ == "SourceFileLoader")
['_collections_abc', '_sitebuiltins', '_weakrefset', 'abc', 'codecs',
'encodings', 'encodings.aliases', 'encodings.latin_1',
'encodings.utf_8', 'genericpath', 'io', 'os', 'os.path', 'posixpath',
'rlcompleter', 'site', 'stat']


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list