[pypy-dev] Saving and reloading JIT optimizations

Fri Aug 19 17:38:40 CEST 2011

On Fri, Aug 19, 2011 at 1:56 PM, David Fraser <davidf at sjsoft.com> wrote:
> The pypy JIT takes a while to work out which parts of python code need optimization etc, and only after that phase do the speedups become relevant. Have there been any efforts (indeed, is it a feasible idea at all) that look at saving these optimizations for future runs of the same codebase?

If it were possible, it would be a tremendous improvement. Not only
for reducing the time before JITting, like you mentioned, but also for
deploying compiled code to a different machine. Increasingly important
are small machines that can hardly afford to run the JIT itself (due
to e.g. RAM limitations) but which would benefit greatly from the
results of JITting.

Such a feature, if it turned out to be possible, would in some sense
bring the advantages of a traditional *ahead-of-time* compiler (and
better advantages).

On Fri, Aug 19, 2011 at 7:32 AM, Armin Rigo <arigo at tunes.org> wrote:
>
> No, this is not really doable.

Maciej and you have both said this in my hearing, and you are both
brilliant experts who know a lot more than I do, but...

Let's not give up so easily!

> The JIT writes explicitly in the assembler the address of a ton of constants.  We have no clue what these constants become when we are in a different process.  Think even just about Python classes: there is no way at all to know that a class at address 0x1234567 is "the same" as a previous class in a previous process at address 0x7654321, let alone defining what exactly "the same" means.

As for remembering what those constants mean when you are in a
different process, can't PyPy write down a table mapping from some
symbol to a constant, and then when loading a ahead-of-time-JITted
program, swap in the values of those constants in this process?

(I know this is a very naïve question. I don't even know what these
constants are that you are talking about.)

As for defining what "the same" means, we require only that it never
considers two things the same when they should not behave
identically—for correctness—plus that it *usually* considers two
things the same when they should—for performance, so we can use a
conservative method of determining if two classes are the same. For
example: if they have identical Python source code and their
superclasses are the same (in this sense).

> Instead, we can work on lowering the warm-up time of the JIT, notably by lowering the (so far very large) overhead it takes for the JIT to trace a loop.

Well, that's certainly an improvement I would be happy to benefit
from. I'm not volunteering to work on the "ahead-of-time-JIT" project
right now (I would hardly know where to begin), but I wanted to speak
up and encourage any ambitious JIT experts to think that it could
actually be doable.

PyPy is already a state-of-the-art JITted interpreter, plus has other
very cool features. But if this "ahead-of-time-JIT" hack could work
then it would become a cool new kind of language implementation that
has never existed before.

Regards,

Zooko