[Numpy-discussion] IDL vs Python parallel computing
Julian Taylor
jtaylor.debian at googlemail.com
Thu May 8 04:10:09 EDT 2014
On 08.05.2014 02:48, Frédéric Bastien wrote:
> Just a quick question/possibility.
>
> What about just parallelizing ufunc with only 1 inputs that is c or
> fortran contiguous like trigonometric function? Is there a fast path in
> the ufunc mechanism when the input is fortran/c contig? If that is the
> case, it would be relatively easy to add an openmp pragma to parallelize
> that loop, with a condition to a minimum number of element.
opemmp is problematic as it gnu openmp deadlocks on fork (multiprocessing)
I think if we do consider adding support using
multiprocessing.pool.ThreadPool could be a good option.
But it also is not difficult for the user to just write a wrapper
function like this:
parallel_trig(x, func, pool):
x = x.reshape(s.size / nthreads, -1) # assuming 1d and no remainder
return array(pool.map(func, x)) # use partial to use the out argument
>
> Anyway, I won't do it. I'm just outlining what I think is the most easy
> case(depending of NumPy internal that I don't now enough) to implement
> and I think the most frequent (so possible a quick fix for someone with
> the knowledge of that code).
>
> In Theano, we found in a few CPUs for the addition we need a minimum of
> 200k element for the parallelization of elemwise to be useful. We use
> that number by default for all operation to make it easy. This is user
> configurable. This warenty that with current generation, the threading
> don't slow thing down. I think that this is more important, don't show
> user slow down by default with a new version.
>
> Fred
>
>
>
>
> On Wed, May 7, 2014 at 2:27 PM, Julian Taylor
> <jtaylor.debian at googlemail.com <mailto:jtaylor.debian at googlemail.com>>
> wrote:
>
> On 07.05.2014 20:11, Sturla Molden wrote:
> > On 03/05/14 23:56, Siegfried Gonzi wrote:
> >
> > A more technical answer is that NumPy's internals does not play very
> > nicely with multithreading. For examples the array iterators used in
> > ufuncs store an internal state. Multithreading would imply an
> excessive
> > contention for this state, as well as induce false sharing of the
> > iterator object. Therefore, a multithreaded NumPy would have
> performance
> > problems due to synchronization as well as hierachical memory
> > collisions. Adding multithreading support to the current NumPy core
> > would just degrade the performance. NumPy will not be able to use
> > multithreading efficiently unless we redesign the iterators in NumPy
> > core. That is a massive undertaking which prbably means rewriting most
> > of NumPy's core C code. A better strategy would be to monkey-patch
> some
> > of the more common ufuncs with multithreaded versions.
>
>
> I wouldn't say that the iterator is a problem, the important iterator
> functions are threadsafe and there is support for multithreaded
> iteration using NpyIter_Copy so no data is shared between threads.
>
> I'd say the main issue is that there simply aren't many functions worth
> parallelizing in numpy. Most the commonly used stuff is already memory
> bandwidth bound with only one or two threads.
> The only things I can think of that would profit is sorting/partition
> and the special functions like sqrt, exp, log, etc.
>
> Generic efficient parallelization would require merging of operations
> improve the FLOPS/loads ratio. E.g. numexpr and theano are able to do so
> and thus also has builtin support for multithreading.
>
> That being said you can use Python threads with numpy as (especially in
> 1.9) most expensive functions release the GIL. But unless you are doing
> very flop intensive stuff you will probably have to manually block your
> operations to the last level cache size if you want to scale beyond one
> or two threads.
More information about the NumPy-Discussion
mailing list