[Python-Dev] Python startup time

Gregory P. Smith greg at krypto.org
Thu May 3 20:02:34 EDT 2018


On Wed, May 2, 2018 at 2:13 PM, Barry Warsaw <barry at python.org> wrote:

> Thanks for bringing this topic up again.  At $day_job, this is a highly
> visible and important topic, since the majority of our command line tools
> are written in Python (of varying versions from 2.7 to 3.6).  Some of those
> tools can take upwards of 5 seconds or more just to respond to —help, which
> causes lots of pain for developers, who complain (rightly so) up the
> management chain. ;)
>
> We’ve done a fair bit of work to bring those numbers down without super
> radical workarounds.  Often there are problems not strictly related to the
> Python interpreter that contribute to this.  Python gets blamed, but it’s
> not always the interpreter’s fault.  Common issues include:
>
> * Modules that have import-time side effects, such as network access or
> expensive creation of data structures.  Python 3.7’s `-X importtime` switch
> is a really wonderful way to identify the worst offenders.  Once 3.7 is
> released, I do plan to spend some time using this to collect data
> internally so we can attack our own libraries, and perhaps put automated
> performance testing into our build stack, to identify start up time
> regressions.
>
> * pkg_resources.  When you have tons of entries on sys.path, pkg_resources
> does a lot of work at import time, and because of common patterns which
> tend to use pkg_resources namespace package support in __init__.py files,
> this just kills start up times.  Of course, pkg_resources has other uses
> too, so even in a purely Python 3 world (where your namespace packages can
> omit the __init__.py), you’ll often get clobbered as soon as you want to
> use the Basic Resource Access API.  This is also pretty common, and it’s
> the main reason why Brett and I created importlib.resources for 3.7 (with a
> standalone API-compatible library for older Pythons).  That’s one less
> reason to use pkg_resources, but it doesn’t address the __init__.py use.
> Brett and I have been talking about addressing that for 3.8.
>
> * pex - which we use as our single file zipapp tool.  Especially the
> interaction between pex and pkg_resources introduces pretty significant
> overhead.  My colleague Loren Carvalho created a tool called shiv which
> requires at least Python 3.6, avoids the use of pkg_resources, and
> implements other tricks to be much more performant than pex.   Shiv is now
> open source and you can find it on RTD and GitHub.
>
> The switch to shiv and importlib.resources can shave 25-50% off of warm
> cache start up times for zipapp style executables.
>
> Another thing we’ve done, although I’m much less sanguine about them as a
> general approach, is to move imports into functions, but we’re trying to
> only use that trick on the most critical cases.
>
> Some import time effects can’t be changed.  Decorators come to mind, and
> click is a popular library for CLIs that provides some great features, but
> decorators do prevent a lazy loading approach.
>
> > On May 1, 2018, at 20:26, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>
> >> You might think "what's a few milliseconds matter".  But if you run
> >> hundreds of commands in a shell script it adds up.  git's speed is one
> >> of the few bright spots in its UX, and hg's comparative slowness here is
> >> a palpable disadvantage.
>
> Oh, for command line tools, milliseconds absolutely matter.
>
> > As a concrete example, I recently landed a Mercurial patch [2] that
> > stubs out zope.interface to prevent the import of 9 modules on every
> > `hg` invocation.
>
> I have a similar dastardly plan to provide a pkg_resources stub :).
>
> > Mercurial provides a `chg` program that essentially spins up a daemon
> > `hg` process running a "command server" so the `chg` program [written in
> > C - no startup overhead] can dispatch commands to an already-running
> > Python/`hg` process and avoid paying the startup overhead cost. When you
> > run Mercurial's test suite using `chg`, it completes *minutes* faster.
> > `chg` exists mainly as a workaround for slow startup overhead.
>
> A couple of our developers demoed a similar approach for one of our CLIs
> that almost everyone uses.  It’s a big application with lots of
> dependencies, so particularly vulnerable to pex and pkg_resources
> overhead.  While it was just a prototype, it was darn impressive to see
> subsequent invocations produce output almost immediately.  It’s unfortunate
> that we have to utilize all these tricks to get even moderately performant
> Python CLIs.
>
>
Note that this kind of "trick" is not unique to Python.  I see it used by
large Java tools at work.  In effect emacs has done similar things for many
decades with its saved core-dump at build time. It saves a snapshot of
initialized elisp interpreter state and loads that into memory instead of
rerunning initialization to reproduce the state.

I don't know if anyone has looked at making a similar concept of saved
post-startup interpreter state for rapid loading as a memory image possible
in Python.  I'm don't believe we're even at the point where all state can
actually accurately be captured from CPython (extension modules can do
anything).  When you do that kind of trick things like hash randomization
tend to complicate matters and might need to be disabled. That feature may
not matter for all CLI tools.

-gps

A few of us spent some time at last year’s core Python dev talking about
> other things we could do to improve Python’s start up time, not just with
> the interpreter itself, but within the larger context of the Python
> ecosystem.  Many ideas seem promising until you dive into the details, so
> it’s definitely hard to imagine maintaining all of Python’s dynamic
> semantics and still making it an order of magnitude faster to start up.
> But that’s not an excuse to give up, and I’m hoping we can continue to
> attack the problem, both in the micro and the macro, for 3.8 and beyond,
> because the alternative is that Python becomes less popular as an
> implementation language for CLIs.  That would be sad, and definitely has a
> long term impact on Python’s popularity.
>
> Cheers,
> -Barry
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%
> 40krypto.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180503/3f5107e0/attachment.html>


More information about the Python-Dev mailing list