[Numpy-discussion] testing with amd libm/acml

Thu Nov 8 12:59:41 EST 2012

On 11/8/12 6:38 PM, Dag Sverre Seljebotn wrote:
> On 11/08/2012 06:06 PM, Francesc Alted wrote:
>> On 11/8/12 1:41 PM, Dag Sverre Seljebotn wrote:
>>> On 11/07/2012 08:41 PM, Neal Becker wrote:
>>>> Would you expect numexpr without MKL to give a significant boost?
>>> If you need higher performance than what numexpr can give without using
>>> MKL, you could look at code such as this:
>>>
>>> https://github.com/herumi/fmath/blob/master/fmath.hpp#L480
>> Hey, that's cool.  I was a bit disappointed not finding this sort of
>> work in open space.  It seems that this lacks threading support, but
>> that should be easy to implement by using OpenMP directives.
> IMO this is the wrong place to introduce threading; each thread should
> call expd_v on its chunks. (Which I think is how you said numexpr
> currently uses VML anyway.)

Oh sure, but then you need a blocked engine for performing the 
computations too.  And yes, by default numexpr uses its own threading 
code rather than the existing one in VML (but that can be changed by 
playing with set_num_threads/set_vml_num_threads).  It always stroked to 
me as a little strange that the internal threading in numexpr was more 
efficient than VML one, but I suppose this is because the latter is more 
optimized to deal with large blocks instead of those of medium size (4K) 
in numexpr.

-- 
Francesc Alted