[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

Mon Mar 24 11:41:28 EDT 2008

> A couple of thoughts on parallelism:
>
> 1. Can someone come up with a small set of cases and time them on
> numpy, IDL, Matlab, and C, using various parallel schemes, for each of
> a representative set of architectures?  We're comparing a benchmark to
> itself on different architectures, rather than seeing whether the
> thread capability is helping our competition on the same architecture.
> If it's mostly not helping them, we can forget it for the time being.
> I suspect that it is, in fact, helping them, or at least not hurting
> them.
>
>   
Well I could ask some IDL users to provide you with benchmarks.
In C/OpenMP I have posted a trivial code.

> 2. Would it slow things much to have some state that the routines
> check before deciding whether to run a parallel implementation or not?
> It could default to single thread except in the cases where
> parallelism always helps, but the user can configure it to multithread
> beyond certain threshholds of, say, number of elements.  Then, in the
> short term, a savvy user can tweak that state to get parallelism for
> more than N elements.  In the longer term, there could be a test
> routine that would run on install and configure the state for that
> particular machine.  When numpy started it would read the saved file
> and computation would be optimized for that machine.  The user could
> always override it.
>
>   
No it wouldn't cost that much and that is exactly the way IDL (for 
instance) works.

> 3. We should remember the first rule of parallel programming, which
> Anne quotes as "premature optimization is the root of all evil".
> There is a lot to fix in numpy that is more fundamental than speed.  I
> am the first to want things fast (I would love my secondary eclipse
> analysis to run in less than a week), but we have gaping holes in
> documentation and other areas that one would expect to have been
> filled before a 1.0 release.  I hope we can get them filled for 1.1.
> It bears repeating that our main resource shortage is in person-hours,
> and we'll get more of those as the community grows.  Right now our
> deficit in documentation is hurting us badly, while our deficit in
> parallelism is not.  There is no faster way of growing the community
> than making it trivial to learn how to use numpy without hand-holding
> from an experienced user.  Let's explore parallelism to assess when
> and how it might be right to do it, but let's stay focussed on the
> fundamentals until we have those nailed.
>
>   
That is well put and clear.
It is also clear that our deficit in parallelism is not hurting us that 
badly.
It is a real problem in some communities like astronomers and images 
processing people but the lack of documentation is  the first one, that 
is true.

 
Xavier