[Python-Dev] Profile Guided Optimization active by-default

Matthias Klose doko at ubuntu.com
Mon Aug 24 22:36:21 CEST 2015


The current pgo target just uses a very specific task to train for the feedback.
For my Debian/Ubuntu builds I'm using the testsuite minus some problematic tests
to train. Otoh I don't know if this is the best way to do it, however it gave
better results at some time in the past.  What I would like is a benchmark / a
mixture of benchmarks on which to enable pgo/pdo. Based on that you could enable
pgo based on some static decisions based on autofdo. For that you don't need any
profile runs during your build; it just needs shipping the autofdo outcome
together with a Python release. This doesn't give you the same performance as
for for a GCC pgo build, but it would be a first step. And defining the probe
for any pgo build would be welcome too.

  Matthias


On 08/22/2015 06:25 PM, Brett Cannon wrote:
> On Sat, Aug 22, 2015, 09:17 Guido van Rossum <guido at python.org> wrote:
> 
> How about we first add a new Makefile target that enables PGO, without
> turning it on by default? Then later we can enable it by default.
> 
> 
> I agree. Updating the Makefile so it's easier to use PGO is great, but we
> should do a release with it as opt-in and go from there.
> 
> Also, I have my doubts about regrtest. How sure are we that it represents a
> typical Python load? Tests are often using a different mix of operations
> than production code.
> 
> That was also my question. You said that "it provides the best performance
> improvement", but compared to what; what else was tried? And what
> difference does it make to e.g. a Django app that is trained on their own
> simulated workload compared to using regrtest? IOW is regrtest displaying
> the best across-the-board performance because it stresses the largest swath
> of Python and thus catches generic patterns in the code but individuals
> could get better performance with a simulated workload?
> 
> -Brett
> 
> 
> On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru <
> alecsandru.patrascu at intel.com> wrote:
> 
> Hi All,
> 
> This is Alecsandru from Server Scripting Languages Optimization team at
> Intel Corporation.
> 
> I would like to submit a request to turn-on Profile Guided Optimization or
> PGO as the default build option for Python (both 2.7 and 3.6), given its
> performance benefits on a wide variety of workloads and hardware.  For
> instance, as shown from attached sample performance results from the Grand
> Unified Python Benchmark, >20% speed up was observed.  In addition, we are
> seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
> the codes are in Python 2.7. Our analysis indicates the performance gain
> was mainly due to reduction of icache misses and CPU front-end stalls.
> 
> Attached is the Makefile patches that modify the all build target and adds
> a new one called "disable-profile-opt". We built and tested this patch for
> Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
> Intel Xeon Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for
> training as it provides the best performance improvement.  Some of the test
> programs in the suite may fail which leads to build fail.  One solution is
> to disable the specific failed test using the "-x " flag (as shown in the
> patch)
> 
> Steps to apply the patch:
> 1.  hg clone https://hg.python.org/cpython cpython
> 2.  cd cpython
> 3.  hg update 2.7 (needed for 2.7 only)
> 4.  Copy *.patch to the current directory
> 5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
> 6.  ./configure
> 7.  make
> 
> To disable PGO
> 7b. make disable-profile-opt
> 
> In the following, please find our sample performance results from latest
> XEON machine, XEON Broadwell EP.
> Hardware (HW):      Intel XEON (Broadwell) 8 Cores
> 
> BIOS settings:      Intel Turbo Boost Technology: false
>                     Hyper-Threading: false
> 
> Operating System:   Ubuntu 14.04.3 LTS trusty
> 
> OS configuration:   CPU freq set at fixed: 2.6GHz by
>                         echo 2600000 >
> /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
>                         echo 2600000 >
> /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
>                     Address Space Layout Randomization (ASLR) disabled (to
> reduce run to run variation) by
>                         echo 0 > /proc/sys/kernel/randomize_va_space
> 
> GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
> 
> Benchmark:          Grand Unified Python Benchmark (GUPB)
>                     GUPB Source: https://hg.python.org/benchmarks/
> 
> Python2.7 results:
>     Python source: hg clone https://hg.python.org/cpython cpython
>     Python Source: hg update 2.7
>     hg id: 0511b1165bb6 (2.7)
>     hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
>     hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5
> 
>         Benchmarks          Speedup(%)
>         simple_logging      20
>         raytrace            20
>         silent_logging      19
>         richards            19
>         chaos               16
>         formatted_logging   16
>         json_dump           15
>         hexiom2             13
>         pidigits            12
>         slowunpickle        12
>         django_v2           12
>         unpack_sequence     11
>         float               11
>         mako                11
>         slowpickle          11
>         fastpickle          11
>         django              11
>         go                  10
>         json_dump_v2        10
>         pathlib             10
>         regex_compile       10
>         pybench             9.9
>         etree_process       9
>         regex_v8            8
>         bzr_startup         8
>         2to3                8
>         slowspitfire        8
>         telco               8
>         pickle_list         8
>         fannkuch            8
>         etree_iterparse     8
>         nqueens             8
>         mako_v2             8
>         etree_generate      8
>         call_method_slots   7
>         html5lib_warmup     7
>         html5lib            7
>         nbody               7
>         spectral_norm       7
>         spambayes           7
>         fastunpickle        6
>         meteor_contest      6
>         chameleon           6
>         rietveld            6
>         tornado_http        5
>         unpickle_list       5
>         pickle_dict         4
>         regex_effbot        3
>         normal_startup      3
>         startup_nosite      3
>         etree_parse         2
>         call_method_unknown 2
>         call_simple         1
>         json_load           1
>         call_method         1
> 
> Python3.6 results
>     Python source: hg clone https://hg.python.org/cpython cpython
>     hg id: 96d016f78726 tip
>     hg id -r 'ancestors(.) and tag()': 1a58b1227501 (3.5) v3.5.0rc1
>     hg --debug id -i: 96d016f78726afbf66d396f084b291ea43792af1
> 
>         Benchmark           Speedup(%)
>         fastunpickle        22.94
>         fastpickle          21.67
>         json_load           17.64
>         simple_logging      17.49
>         meteor_contest      16.67
>         formatted_logging   15.33
>         etree_process       14.61
>         raytrace            13.57
>         etree_generate      13.56
>         chaos               12.09
>         hexiom2             12
>         nbody               11.88
>         json_dump_v2        11.24
>         richards            11.02
>         nqueens             10.96
>         fannkuch            10.79
>         go                  10.77
>         float               10.26
>         regex_compile       9.8
>         silent_logging      9.63
>         pidigits            9.58
>         etree_iterparse     9.48
>         2to3                8.44
>         regex_v8            8.09
>         regex_effbot        7.88
>         call_simple         7.63
>         tornado_http        7.38
>         etree_parse         4.92
>         spectral_norm       4.72
>         normal_startup      4.39
>         telco               3.88
>         startup_nosite      3.7
>         call_method         3.63
>         unpack_sequence     3.6
>         call_method_slots   2.91
>         call_method_unknown 2.59
>         iterative_count     0.45
>         threaded_count      -2.79
> 
> Thank you,
> Alecsandru
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> 
> 
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
> 
> 
> 
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/doko%40ubuntu.com
> 



More information about the Python-Dev mailing list