[issue36044] PROFILE_TASK for PGO build is not a good workload

Neil Schemenauer report at bugs.python.org
Wed Jul 10 23:36:07 EDT 2019

Neil Schemenauer <nas-python at arctrix.com> added the comment:

> Decreasing the total wall time for a default --enable-optimizations build would 
> be a good thing for everyone, provided the resulting interpreter remains 
> "effectively similar" in speed.  If you somehow manage to find something that
> actually speeds up the resulting interpreter, amazing!

I spent quite a lot of time making different PGO builds and comparing with pyperformance.  The current PGO task is *really* slow.  Just running the PROFILE_TASK takes 24 minutes on my decently fast PC.

Using this set of tests seems to work pretty well:

PROFILE_TASK=-m test.regrtest --pgo \
        test_collections \
        test_dataclasses \
        test_difflib \
        test_embed \
        test_float \
        test_functools \
        test_generators \
        test_int \
        test_itertools \
        test_json \
        test_logging \
        test_long \
        test_ordered_dict \
        test_pickle \
        test_pprint \
        test_re \
        test_set \
        test_statistics \
        test_struct \
        test_tabnanny \

Instead of 24 minutes, the above task takes one and a half minutes.  pyperformance results seem largely unchanged.  Comparison below.  Tuning the tests to get the best pyperformance result is a bit dangerous and perhaps running the whole test suite is safer (i.e. we are not optimizing for specific benchmarks).  I didn't tweak the list too much.  I added test_int, test_long, test_struct and test_itertools as a result of my pyperformance runs.  Not too surprising those are important modules.

I think the set of tests above should do a pretty good job of covering the hot code paths in most Python programs.  So, maybe it is good enough given the massive speedup in build time.

| Benchmark               | task-all | task-short                   |
| 2to3                    | 311 ms   | 315 ms: 1.01x slower (+1%)   |
| chaos                   | 111 ms   | 108 ms: 1.02x faster (-2%)   |
| crypto_pyaes            | 114 ms   | 112 ms: 1.01x faster (-1%)   |
| dulwich_log             | 78.0 ms  | 78.7 ms: 1.01x slower (+1%)  |
| fannkuch                | 470 ms   | 452 ms: 1.04x faster (-4%)   |
| float                   | 118 ms   | 117 ms: 1.01x faster (-1%)   |
| go                      | 253 ms   | 255 ms: 1.01x slower (+1%)   |
| json_dumps              | 12.5 ms  | 11.8 ms: 1.06x faster (-6%)  |
| json_loads              | 26.3 us  | 25.4 us: 1.04x faster (-3%)  |
| logging_format          | 9.53 us  | 9.66 us: 1.01x slower (+1%)  |
| logging_silent          | 198 ns   | 196 ns: 1.01x faster (-1%)   |
| mako                    | 15.2 ms  | 15.8 ms: 1.04x slower (+4%)  |
| meteor_contest          | 98.2 ms  | 96.8 ms: 1.01x faster (-1%)  |
| nbody                   | 135 ms   | 133 ms: 1.01x faster (-1%)   |
| nqueens                 | 97.2 ms  | 96.6 ms: 1.01x faster (-1%)  |
| pathlib                 | 19.4 ms  | 19.7 ms: 1.02x slower (+2%)  |
| pickle                  | 8.10 us  | 9.07 us: 1.12x slower (+12%) |
| pickle_dict             | 23.1 us  | 18.6 us: 1.25x faster (-20%) |
| pickle_list             | 3.64 us  | 2.81 us: 1.30x faster (-23%) |
| pickle_pure_python      | 470 us   | 460 us: 1.02x faster (-2%)   |
| pidigits                | 169 ms   | 173 ms: 1.02x slower (+2%)   |
| python_startup          | 7.94 ms  | 8.02 ms: 1.01x slower (+1%)  |
| python_startup_no_site  | 5.44 ms  | 5.49 ms: 1.01x slower (+1%)  |
| raytrace                | 495 ms   | 490 ms: 1.01x faster (-1%)   |
| regex_dna               | 172 ms   | 179 ms: 1.04x slower (+4%)   |
| regex_effbot            | 2.95 ms  | 2.85 ms: 1.04x faster (-3%)  |
| regex_v8                | 20.7 ms  | 21.5 ms: 1.04x slower (+4%)  |
| richards                | 68.9 ms  | 69.8 ms: 1.01x slower (+1%)  |
| scimark_sparse_mat_mult | 4.57 ms  | 4.29 ms: 1.07x faster (-6%)  |
| spectral_norm           | 134 ms   | 133 ms: 1.01x faster (-1%)   |
| sqlalchemy_declarative  | 161 ms   | 163 ms: 1.01x slower (+1%)   |
| sqlalchemy_imperative   | 30.6 ms  | 31.0 ms: 1.01x slower (+1%)  |
| sqlite_synth            | 2.90 us  | 2.95 us: 1.02x slower (+2%)  |
| sympy_expand            | 422 ms   | 418 ms: 1.01x faster (-1%)   |
| sympy_integrate         | 19.0 ms  | 19.2 ms: 1.01x slower (+1%)  |
| sympy_sum               | 89.6 ms  | 91.7 ms: 1.02x slower (+2%)  |
| telco                   | 6.06 ms  | 6.28 ms: 1.04x slower (+4%)  |
| tornado_http            | 178 ms   | 181 ms: 1.02x slower (+2%)   |
| unpickle_list           | 3.97 us  | 3.78 us: 1.05x faster (-5%)  |
| unpickle_pure_python    | 326 us   | 324 us: 1.00x faster (-0%)   |
| xml_etree_generate      | 90.6 ms  | 91.0 ms: 1.00x slower (+0%)  |
| xml_etree_process       | 72.0 ms  | 71.4 ms: 1.01x faster (-1%)  |

Not significant (15): deltablue; django_template; hexiom; html5lib; logging_simple; regex_compile; scimark_fft; scimark_lu; scimark_monte_carlo; scimark_sor; sympy_str; unpack_sequence; unpickle; xml_etree_parse; xml_etree_iterparse


