[Python-ideas] Bytecode JIT

Fri Jun 30 18:30:59 EDT 2017

On 2017-06-30 07:17 PM, Victor Stinner wrote:
> 2017-06-30 17:09 GMT+02:00 Soni L. <fakedme+py at gmail.com>:
>> CPython should get a tracing JIT that turns slow bytecode into fast
>> bytecode.
>>
>> A JIT doesn't have to produce machine code. bytecode-to-bytecode compilation
>> is still compilation. bytecode-to-bytecode compilation works on iOS, and
>> doesn't require deviating from C.
> Optimizations require to make assumptions on the code, and deoptimize
> if an assumption becomes wrong. I call these things "guards". If I
> understood correctly, PyPy is able to deoptimize a function in the
> middle of the function, while executing it. In my FAT Python project,
> I tried something simpler: add guards at the function entry point, and
> decide at the entry which version of the code should be run (FAT
> Python allows to have more than 2 versions of the code for the same
> function).
>
> I described my implementation in the PEP 510:
> https://www.python.org/dev/peps/pep-0510/
>
> I agree that you *can* emit more efficient bytecode using assumptions.
> But I'm not sure that the best speedup will be worth it. For example,
> if your maximum speedup is 20% but the JIT compiler increases the
> startup time and uses more memory, I'm not sure that users will use
> it. The design will restrict indirectly the maximum speed.
>
> At the bytecode level, you cannot specialize bytecode for 1+2 (x+y
> with x=1 and y=2) for example. The BINARY_ADD instruction calls
> PyNumber_Add(), but a previous experience showed that the dispatch
> inside PyNumber_Add() to reach long_add() is expensive.

If you can assert that the sum(s) never overflow an int, you can avoid 
hitting long_add() entirely, and avoid all the checks around it. IADD 
would be IADD as opposed to NADD because it would add two ints 
specifically, not two numbers. And it would do no overflow checks 
because the JIT already told it no overflow can happen.

>
> I'm trying to find a solution to not make CPython 20% faster, but 2x
> faster. See my talk at the recent Python Language Summit (at Pycon
> US):
> https://github.com/haypo/conf/raw/master/2017-PyconUS/summit.pdf
> https://lwn.net/Articles/723949/
>
> My mid-term/long-term plan for FAT Python is to support multiple
> optimizers, and allow developers to choose between bytecode ("Python"
> code) and machine code ("C" code). For example, work on an optimizer
> reusing Cython rather than writing a new compiler from scratch. My
> current optimizer works at the AST level and emits more efficient
> bytecode by rewriting the AST.
>
> But another major design choice in FAT Python is to run the optimizer
> ahead-of-time (AoT), rather than just-in-time (JIT). Maybe it will not
> work. We will see :-)
>
> I suggest you to take a look at my notes to make CPython faster:
> http://faster-cpython.readthedocs.io/
>
> FAT Python homepage:
> http://faster-cpython.readthedocs.io/fat_python.html
>
> --
>
> You may also be interested by my Pycon US talk about CPython
> optimization in 3.5, 3.6 and 3.7:
> https://lwn.net/Articles/725114/
>
> Victor