[issue25823] Speed-up oparg decoding on little-endian machines

Sat Dec 12 04:53:21 EST 2015

Armin Rigo added the comment:

Fwiw, I made a trivial benchmark in C that loads aligned and misaligned shorts ( http://paste.pound-python.org/show/HwnbCI3Pqsj8bx25Yfwp/ ).  It shows that the memcpy() version takes only 65% of the time taken by the two-bytes-loaded version on a 2010 laptop.  It takes 75% of the time on a modern server.  On a recent little-endian PowerPC machine, 96%.  On aarch64, only 45% faster (i.e. more than twice faster).  This is all with gcc.  It seems that using memcpy() is definitely a win nowadays.

----------
nosy: +arigo

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25823>
_______________________________________