[issue34296] Speed up python startup by pre-warming the vm

Tue Aug 7 23:35:40 EDT 2018

INADA Naoki <songofacandy at gmail.com> added the comment:

On Wed, Aug 8, 2018 at 6:40 AM Cyker Way <report at bugs.python.org> wrote:
>
> Cyker Way <cykerway at gmail.com> added the comment:
>
> >   While this issue is "pre warming VM", VM startup is not significant part of your 500ms.
>
> 10-20ms should be OK for shell scripts. But a fork is still faster.
>
> >   You're talking about application specific strategy now. It's different of this issue.
>
> Actually, this issue is created to look for a generic approach that can optimize the running time for most, or even all, python scripts. Different scripts may import different modules, but this doesn't mean there isn't a method that works for all of them.
>

"Fork before loading any modules" and "fork after loading application"
are totally different.
It's not generic.  Former can be done by Python core, but I'm not sure
it's really helpful.
Later can be done in some framework.  And it can be battle tested on
3rd party CLI framework
before adding it to Python stdlib.

> >   And many ideas like yours are already discussed on ML, again and again.
>
> I browsed about 6-7 threads on python-dev. I think 2-3 of them provide great information. But I don't think any of them gives concrete solutions. So we are still facing this problem today.
>

I didn't mean it's solved.  I meant many people said same idea again and again,
and I'm tired to read such random ideas.
Python provides os.fork already.  People can use it.  CLI frameworks can use it.
So what Python should be provide additionally?

> >   I want to close this issue. Please give us more concrete idea or patch with target sample application you want to optimize.
>
> As said I'm looking for a generic approach. So optimizing specific applications isn't really the goal of this issue (though work on specific modules still helps). I did implement a proof of concept (link: <https://github.com/cykerway/pyforkexec>) for the fork-exec startup approach. It's still very rough and elementary, but proves this approach has its value. As Nick said:
>

I doubt there is generic and safe approach which is fit in stdlib.
For example, your PoC includes several considerable points:

* Different Python, venv, or application version may listen the unix socket.
* Where should be the socket listen?  How about socket permission?
* Environment variable may be differ.
* When the server is not used for a long time, it should be exit automatically.
  Client start server if there are no server listening.

I prefer battle-testing idea as 3rd party tool first.  Then, we can
discuss about
we should add it on stdlib or not.

"Add it on PyPI first" approach has several benefits:

* It can be used with older Python.
* It can be evolve quickly than Python's 1.5year release cycle.

> >   ...the CPython runtime is highly configurable, so it's far from clear what, if anything, could be shared from run to run...
>
> What I hope is we can inspect these configurations and figure out the invariants. This would help us make a clean environment as the forking base. If this is impossible, I think we can still fork from a known interpreter state chosen by the user script author. You may close this issue if nobody has more to say, but I hope the fork-exec startup can be standardized one day as I believe, for quick scripts, however much you optimize the cold start it can't be faster than a fork.
>

It relating only with "former" (fork before loading application) approach.
I don't think it's really worth enough.  Benefits will be very small compared to
it's danger and complexity.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34296>
_______________________________________