[issue31834] BLAKE2: the (pure) SSE2 impl forced on x86_64 is slower than reference

Christian Heimes report at bugs.python.org
Tue Oct 24 03:25:04 EDT 2017


Christian Heimes <lists at cheimes.de> added the comment:

I'm pretty sure that your PR has disabled all SSE optimizations. AFAIK gcc does not enable SSE3 and SSE4 on X86_64 by default.

$ gcc -dM -E - < /dev/null | grep SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1

You have to set a compiler flag like -msse4

$ gcc -msse4 -dM -E - < /dev/null | grep SSE
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE3__ 1

----------
nosy: +christian.heimes
resolution: fixed -> 
stage: resolved -> 
status: closed -> open

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue31834>
_______________________________________


More information about the Python-bugs-list mailing list