[issue31834] BLAKE2: the (pure) SSE2 impl forced on x86_64 is slower than reference
Christian Heimes
report at bugs.python.org
Tue Oct 24 03:25:04 EDT 2017
Christian Heimes <lists at cheimes.de> added the comment:
I'm pretty sure that your PR has disabled all SSE optimizations. AFAIK gcc does not enable SSE3 and SSE4 on X86_64 by default.
$ gcc -dM -E - < /dev/null | grep SSE
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1
You have to set a compiler flag like -msse4
$ gcc -msse4 -dM -E - < /dev/null | grep SSE
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE3__ 1
----------
nosy: +christian.heimes
resolution: fixed ->
stage: resolved ->
status: closed -> open
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue31834>
_______________________________________
More information about the Python-bugs-list
mailing list