[Numpy-svn] [numpy/numpy] 0adcca: ENH: vectorize sqrt ufunc using SSE2
GitHub
noreply at github.com
Sat May 25 11:49:25 EDT 2013
Branch: refs/heads/master
Home: https://github.com/numpy/numpy
Commit: 0adccaaa910ab495e993f453956fd983775604f3
https://github.com/numpy/numpy/commit/0adccaaa910ab495e993f453956fd983775604f3
Author: Julian Taylor <jtaylor.debian at googlemail.com>
Date: 2013-05-25 (Sat, 25 May 2013)
Changed paths:
M numpy/core/code_generators/generate_umath.py
M numpy/core/setup.py
M numpy/core/setup_common.py
M numpy/core/src/private/lowlevel_strided_loops.h
M numpy/core/src/scalarmathmodule.c.src
M numpy/core/src/umath/loops.c.src
M numpy/core/src/umath/loops.h
M numpy/core/src/umath/loops.h.src
M numpy/core/tests/test_umath.py
M numpy/testing/utils.py
Log Message:
-----------
ENH: vectorize sqrt ufunc using SSE2
specialize the sqrt ufunc for float and double and vectorize it using
SSE2.
improves performance by 4/2 for float/double if one is not memory bound
due to non-cached data.
performance is always better on all tested machines (amd phenom X2,
intel xeon 5xxx/7xxx, core2duo, corei7)
This version will not set errno on invalid input, but numpy only checks
the fpu flags so the behavior is the same.
In principle the compiler could autovectorize it when setting ffast-math
(for no errno) and specializing the loop for the vectorizable strides
and giving it some hints (restrict, __builtin_assume_aligned, etc.),
but its simpler and more reliable to simply vectorize it by hand.
Commit: fe69102dd34619ce18cf074ef0e6e46611bc17e7
https://github.com/numpy/numpy/commit/fe69102dd34619ce18cf074ef0e6e46611bc17e7
Author: Julian Taylor <jtaylor.debian at googlemail.com>
Date: 2013-05-25 (Sat, 25 May 2013)
Changed paths:
M numpy/core/setup_common.py
M numpy/core/src/multiarray/einsum.c.src
Log Message:
-----------
MAINT: use sse header macros for einsum sse activation
Commit: 31a550189371ed21f8d38edae02f71f18a729741
https://github.com/numpy/numpy/commit/31a550189371ed21f8d38edae02f71f18a729741
Author: Charles Harris <charlesr.harris at gmail.com>
Date: 2013-05-25 (Sat, 25 May 2013)
Changed paths:
M numpy/core/code_generators/generate_umath.py
M numpy/core/setup.py
M numpy/core/setup_common.py
M numpy/core/src/multiarray/einsum.c.src
M numpy/core/src/private/lowlevel_strided_loops.h
M numpy/core/src/scalarmathmodule.c.src
M numpy/core/src/umath/loops.c.src
M numpy/core/src/umath/loops.h
M numpy/core/src/umath/loops.h.src
M numpy/core/tests/test_umath.py
M numpy/testing/utils.py
Log Message:
-----------
Merge pull request #3341 from juliantaylor/sse2-sqrt
vectorize sqrt ufunc with SSE2
Compare: https://github.com/numpy/numpy/compare/a02457f1d76d...31a550189371
More information about the Numpy-svn
mailing list