[Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

Julian Taylor jtaylor.debian at googlemail.com
Fri Apr 11 14:38:12 EDT 2014


On 11.04.2014 19:05, Sturla Molden wrote:
> Sturla Molden <sturla.molden at gmail.com> wrote:
> 
>> Making a totally new BLAS might seem like a crazy idea, but it might be the
>> best solution in the long run. 
> 
> To see if this can be done, I'll try to re-implement cblas_dgemm and then
> benchmark against MKL, Accelerate and OpenBLAS. If I can get the
> performance better than 75% of their speed, without any assembly or dark
> magic, just plain C99 compiled with Intel icc, that would be sufficient for
> binary wheels on Windows I think.
> 


hi,
if you can, also give gcc with graphite a try. Its loop transformations
should give you similar results as manual blocking if the compiler is
able to understand the loop, see
http://gcc.gnu.org/gcc-4.4/changes.html
-floop-strip-mine
-floop-block
-floop-interchange
+ a couple options to tune the parameters

you may need gcc-4.8 for it to work properly on not compile time fixed
loop iteration counts.
So far i know clang/llvm also has graphite integration.

Cheers,
Julian



More information about the NumPy-Discussion mailing list