[Numpy-discussion] numexpr with the new iterator
Francesc Alted
faltet at pytables.org
Tue Jan 11 06:58:27 EST 2011
A Tuesday 11 January 2011 06:45:28 Mark Wiebe escrigué:
> On Mon, Jan 10, 2011 at 11:35 AM, Mark Wiebe <mwwiebe at gmail.com>
wrote:
> > I'm a bit curious why the jump from 1 to 2 threads is scaling so
> > poorly.
> >
> > Your timings have improvement factors of 1.85, 1.68, 1.64, and
> > 1.79. Since
> >
> > the computation is trivial data parallelism, and I believe it's
> > still pretty far off the memory bandwidth limit, I would expect a
> > speedup of 1.95 or higher.
>
> It looks like it is the memory bandwidth which is limiting the
> scalability.
Indeed, this is an increasingly important problem for modern computers.
You may want to read:
http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf
;-)
> The slower operations scale much better than faster
> ones. Below are some timings of successively faster operations.
> When the operation is slow enough, it scales like I was expecting...
[clip]
Yeah, for another example on this with more threads, see:
http://code.google.com/p/numexpr/wiki/MultiThreadVM
OTOH, I was curious about the performance of the new iterator with
Intel's VML, but it seems to work decently too:
$ python bench/vml_timing.py (original numexpr, *no* VML support)
*************** Numexpr vs NumPy speed-ups *******************
Contiguous case: 1.72 (mean), 0.92 (min), 3.07 (max)
Strided case: 2.1 (mean), 0.98 (min), 3.52 (max)
Unaligned case: 2.35 (mean), 1.35 (min), 3.31 (max)
$ python bench/vml_timing.py (original numexpr, VML support)
*************** Numexpr vs NumPy speed-ups *******************
Contiguous case: 3.83 (mean), 1.1 (min), 10.19 (max)
Strided case: 3.21 (mean), 0.98 (min), 7.45 (max)
Unaligned case: 3.6 (mean), 1.47 (min), 7.87 (max)
$ python bench/vml_timing.py (new iter numexpr, VML support)
*************** Numexpr vs NumPy speed-ups *******************
Contiguous case: 3.56 (mean), 1.12 (min), 7.38 (max)
Strided case: 2.37 (mean), 0.09 (min), 7.63 (max)
Unaligned case: 3.56 (mean), 2.08 (min), 5.88 (max)
However, there a couple of quirks here. 1) The original Numexpr
performs generally faster than the iter version. 2) The strided case is
quite worse for the iter version. I've isolated the tests that performs
worse for the iter version, and here are a couple of samples:
*************** Expression: exp(f3)
numpy: 0.0135
numpy strided: 0.0144
numpy unaligned: 0.0200
numexpr: 0.0020 Speed-up of numexpr over numpy: 6.6584
numexpr strided: 0.1495 Speed-up of numexpr over numpy: 0.0962
numexpr unaligned: 0.0049 Speed-up of numexpr over numpy: 4.0859
*************** Expression: sin(f3)>cos(f4)
numpy: 0.0291
numpy strided: 0.0366
numpy unaligned: 0.0407
numexpr: 0.0166 Speed-up of numexpr over numpy: 1.7518
numexpr strided: 0.1551 Speed-up of numexpr over numpy: 0.2361
numexpr unaligned: 0.0175 Speed-up of numexpr over numpy: 2.3246
Maybe you can shed some light on what's going on here (shall we discuss
this off-the-list so as to not bore people too much?).
--
Francesc Alted
More information about the NumPy-Discussion
mailing list