[issue45116] Performance regression 3.10b1 and later on Windows

neonene report at bugs.python.org
Fri Sep 10 22:15:36 EDT 2021


neonene <nicesalmon at gmail.com> added the comment:

Thanks for all suggestions. I focused on my bisected commit and the previous.

I run pyperformance with 4 functions never inlined in the sections below.

   _Py_DECREF()
   _Py_XDECREF()
   _Py_IS_TYPE()
   _Py_atomic_load_32bit_impl()

are

   (1) never inlined in _PyEval_EvalFrameDefault().
   (2) never inlined in the other funcitons.
   (3) never inlined in all functions.


slow downs                     [4-funcs never inlined section]
--------------------------------------------------------------
Windows x64 PGO (44job)            (*)    (1)    (2)    (3)
rebuild                            none   eval  others  all
--------------------------------------------------------------
b98eba5 (4 funcs inlined in eval)  1.00   1.05   1.09   1.14
PR25244 (    not inlined in eval)  1.06   1.07   1.18   1.17

pyperf compare_to upper lower:
   (*) 1.06x slower  (slower 45, faster  4, not not significant  9)
   (1) 1.02x slower  (slower 33, faster 13, not not significant 12)
   (2) 1.08x slower  (slower 48, faster  6, not not significant  4)
   (3) 1.03x slower  (slower 39, faster  6, not not significant 13)


--------------------------------------------------------------
Windows x86 PGO (44job)            (*)    (1)    (2)    (3)
rebuild                            none   eval  others  all
--------------------------------------------------------------
b98eba5 (4 funcs inlined in eval)  1.00   1.03   1.06   1.15
PR25244 (    not inlined in eval)  1.13   1.13   1.22   1.24

pyperf compare_to upper lower:
   (*) 1.13x slower  (slower 54, faster  2, not not significant  2)
   (1) 1.10x slower  (slower 47, faster  3, not not significant  8)
   (2) 1.14x slower  (slower 54, faster  1, not not significant  3)
   (3) 1.08x slower  (slower 43, faster  3, not not significant 12)


In both x64 and x86, it looks column (2) and (*) has similar gaps.
So, I would like to simply focus on the eval-loop.

I built PGO with "/d2inlinestats" and "/d2inlinelogfull:_PyEval_EvalFrameDefault" according to the blog.

I posted logs. As for PR25244, the logsize is 3x smaller than the previous and pgo rejects the 4 funcs above. I will look into it later.


Collecting:
> Before the PR, it took 10x~ longer to link than without __forceinline function.

Current build is 10x~ shorter than before to link.
Before the PR, __forceinline had no impact to me.

----------
Added file: https://bugs.python.org/file50271/b98e-no-inline-in-all.diff

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45116>
_______________________________________


More information about the Python-bugs-list mailing list