[pypy-dev] PGO Optimized Binary

Armin Rigo armin.rigo at gmail.com
Wed Nov 2 05:18:40 EDT 2016


Hi,

On 31 October 2016 at 22:28, Singh, Yashwardhan
<yashwardhan.singh at intel.com> wrote:
> We applied compiler assisted optimization technique called PGO or Profile Guided Optimization while building PyPy, and found performance got improved by up to 22.4% on the Grand Unified Python Benchmark (GUPB) from “hg clone https://hg.python.org/benchmarks”.  The below result table shows majority of 51 micros got performance boost with 8 got performance regression.

The kind of performance improvement you are measuring involves only
short- or very short-running programs.  A few years ago we'd have
shrugged it off as irrelevant---"please modify the benchmarks so that
they run for at least 10 seconds, more if they are larger"---because
the JIT compiler doesn't have a chance to warm up.  But we'd also have
shrugged off your whole attempt---"PGO optimization cannot change
anything to the speed of JIT-produced machine code".

Nowadays we tend to look more seriously at the cold or warming-up
performance too, or at least we know that we should look there.  There
are (stalled) plans of setting up a second benchmark suite for PyPy
which focuses on this.

You can get an estimate of whether you're looking at cold or hot code:
compare the timings with CPython.  Also, you can set the environment
variable  ``PYPYLOG=jit-summary:-`` and look at the first 2 lines to
see how much time was spent warming up the JIT (or attempting to).

Note that we did enable PGO long ago, with modest benefits.  We gave
up when our JIT compiler became good enough.  Maybe now is the time to
try again (and also, PGO itself might have improved in the meantime).

> We’d like to get some input on how to contribute our optimization recipe to the PyPy dev tree, perhaps by creating an item to the PyPy issue tracker?

The best would be to create a pull request so that we can look at your
changes more easily.

> In addition, we would also appreciate any other benchmark or real world use based workload as alternatives to evaluate this.

You can take any Python program that runs either very shortly or not
faster than CPython.  For a larger example (with Python 2.7):

    cd rpython/jit/tl
    python ../../bin/rpython -O2 --source targettlr    # 24 secs
    pypy ../../bin/rpython -O2 --source targettlr        # 39 secs


A bientôt,

Armin.


More information about the pypy-dev mailing list