[issue28618] Decorate hot functions using attribute((hot)) to optimize Python

Tue Nov 15 10:50:33 EST 2016

STINNER Victor added the comment:

Serhiy Storchaka:
>> * json: scanstring_unicode()
>
> This doesn't look wise. This is specific to single extension module and perhaps to single particular benchmark. Most Python code don't use json at all.

Well, I tried different things to make these benchmarks more stable. I didn't say that we should merge hot3.patch as it is :-) It's just an attempt.

> What is the top of "perf report"?

For json_loads, it's:

 14.99%  _json.cpython-37m-x86_64-linux-gnu.so  scanstring_unicode
  8.34%  python                                 _PyUnicode_FromUCS1
  8.32%  _json.cpython-37m-x86_64-linux-gnu.so  scan_once_unicode
  8.01%  python                                 lookdict_unicode_nodummy
  6.72%  python                                 siphash24
  4.45%  python                                 PyDict_SetItem
  4.26%  python                                 _PyObject_Malloc
  3.38%  python                                 _PyEval_EvalFrameDefault
  3.16%  python                                 _Py_HashBytes
  2.72%  python                                 PyUnicode_New
  2.36%  python                                 PyLong_FromString
  2.25%  python                                 _PyObject_Free
  2.02%  libc-2.19.so                           __memcpy_sse2_unaligned
  1.61%  python                                 PyDict_GetItem
  1.40%  python                                 dictresize
  1.24%  python                                 unicode_hash
  1.11%  libc-2.19.so                           _int_malloc
  1.07%  python                                 unicode_dealloc
  1.00%  python                                 free_keys_object

Result produced with:

   $ perf record ./python ~/performance/performance/benchmarks/bm_json_loads.py --worker -v -l 128 -w0 -n 100                                                                  
   $ perf report                          

> How this list intersects with the list of functions in .text.hot section of PGO build?

I checked which functions are considered as "hot" by a PGO build: I found more than 2,000 functions... I'm not interested to tag so many functions with _Py_HOT_FUNCTIONS. I would prefer to only tag something like the top 10 or top 25 functions.

I don't know the recommandations to tag functions as hot. I guess that what matters is the total size of hot functions. Should I be smaller than the L2 cache? Smaller than the L3 cache? I'm talking about instructions, but data share also these caches...

> Make several PGO builds (perhaps on different computers). Is .text.hot section stable?

In my experience PGO builds don't provide stable performances, but I was never able to write an article on that because of so many bugs :-)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28618>
_______________________________________

[issue28618] Decorate hot functions using __attribute__((hot)) to optimize Python

[issue28618] Decorate hot functions using attribute((hot)) to optimize Python