[Numpy-svn] [numpy/numpy] 0adcca: ENH: vectorize sqrt ufunc using SSE2

GitHub noreply at github.com
Sat May 25 11:49:25 EDT 2013


  Branch: refs/heads/master
  Home:   https://github.com/numpy/numpy
  Commit: 0adccaaa910ab495e993f453956fd983775604f3
      https://github.com/numpy/numpy/commit/0adccaaa910ab495e993f453956fd983775604f3
  Author: Julian Taylor <jtaylor.debian at googlemail.com>
  Date:   2013-05-25 (Sat, 25 May 2013)

  Changed paths:
    M numpy/core/code_generators/generate_umath.py
    M numpy/core/setup.py
    M numpy/core/setup_common.py
    M numpy/core/src/private/lowlevel_strided_loops.h
    M numpy/core/src/scalarmathmodule.c.src
    M numpy/core/src/umath/loops.c.src
    M numpy/core/src/umath/loops.h
    M numpy/core/src/umath/loops.h.src
    M numpy/core/tests/test_umath.py
    M numpy/testing/utils.py

  Log Message:
  -----------
  ENH: vectorize sqrt ufunc using SSE2

specialize the sqrt ufunc for float and double and vectorize it using
SSE2.

improves performance by 4/2 for float/double if one is not memory bound
due to non-cached data.
performance is always better on all tested machines (amd phenom X2,
intel xeon 5xxx/7xxx, core2duo, corei7)

This version will not set errno on invalid input, but numpy only checks
the fpu flags so the behavior is the same.

In principle the compiler could autovectorize it when setting ffast-math
(for no errno) and specializing the loop for the vectorizable strides
and giving it some hints (restrict, __builtin_assume_aligned, etc.),
but its simpler and more reliable to simply vectorize it by hand.


  Commit: fe69102dd34619ce18cf074ef0e6e46611bc17e7
      https://github.com/numpy/numpy/commit/fe69102dd34619ce18cf074ef0e6e46611bc17e7
  Author: Julian Taylor <jtaylor.debian at googlemail.com>
  Date:   2013-05-25 (Sat, 25 May 2013)

  Changed paths:
    M numpy/core/setup_common.py
    M numpy/core/src/multiarray/einsum.c.src

  Log Message:
  -----------
  MAINT: use sse header macros for einsum sse activation


  Commit: 31a550189371ed21f8d38edae02f71f18a729741
      https://github.com/numpy/numpy/commit/31a550189371ed21f8d38edae02f71f18a729741
  Author: Charles Harris <charlesr.harris at gmail.com>
  Date:   2013-05-25 (Sat, 25 May 2013)

  Changed paths:
    M numpy/core/code_generators/generate_umath.py
    M numpy/core/setup.py
    M numpy/core/setup_common.py
    M numpy/core/src/multiarray/einsum.c.src
    M numpy/core/src/private/lowlevel_strided_loops.h
    M numpy/core/src/scalarmathmodule.c.src
    M numpy/core/src/umath/loops.c.src
    M numpy/core/src/umath/loops.h
    M numpy/core/src/umath/loops.h.src
    M numpy/core/tests/test_umath.py
    M numpy/testing/utils.py

  Log Message:
  -----------
  Merge pull request #3341 from juliantaylor/sse2-sqrt

vectorize sqrt ufunc with SSE2


Compare: https://github.com/numpy/numpy/compare/a02457f1d76d...31a550189371


More information about the Numpy-svn mailing list