[issue45116] Performance regression 3.10b1 and later on Windows

neonene report at bugs.python.org
Tue Sep 7 14:13:04 EDT 2021


neonene <nicesalmon at gmail.com> added the comment:

@vstinner: __forceinline suggestion

Since PR25244 (mentioned above), it seems link.exe has got to get stuck on python310.dll.
Before the PR, it took 10x~ longer to link than without __forceinline function.
I can confirm with _Py_DECREF() and _Py_XDECREF() and one training-job (the more fucntions forced/jobs used, the slower to link).
Have you tried __forceinline on PGO ?


> I don't understand how to read the table.

Overhead field is the output of pyperf command, not subtraction (the answers are the same just luckily).

ex) 3.10rc1x86 PGO: 
     PGO      : pyperf compare_to 3.10a7 left
     patched  : pyperf compare_to 3.10a7 right
     overhead : pyperf compare_to right  left 
  are
     1.15x slower (slower 52, faster  4, not significant  2)
     1.13x slower (slower 50, faster  4, not significant  4)
     1.02x slower (slower 29, faster 14, not significant 15)


> I'm not sure if PGO builds are reproducible,

MSVC does not produce the same code. Inlining (all or nothing) might be a quite special case in the hottest section.
I suspect the profiler doesn't work well only for _PyEval_EvalFrameDefault(), including branch/align optimization.
So my posted macro or inlining is just for a mesureing, not the solution.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45116>
_______________________________________


More information about the Python-bugs-list mailing list