[Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Juha Jeronen juha.jeronen at jyu.fi
Wed Sep 30 20:20:13 EDT 2015


On 30.09.2015 19:20, Nathaniel Smith wrote:
> The challenges to providing transparent multithreading in numpy 
> generally are:
>
> - gcc + OpenMP on linux still breaks multiprocessing. There's a patch 
> to fix this but they still haven't applied it; alternatively there's a 
> workaround you can use in multiprocessing (not using fork mode), but 
> this requires every user update their code and the workaround has 
> other limitations. We're unlikely to use OpenMP while this is the case.
>

Ah, I didn't know this. Thanks.


> - parallel code in general is not very composable. If someone is 
> calling a numpy operation from one thread, great, transparently using 
> multiple threads internally is a win. If they're exploiting some 
> higher-level structure in their problem to break it into pieces and 
> process each in parallel, and then using numpy on each piece, then 
> numpy spawning threads internally will probably destroy performance. 
> And numpy is too low-level to know which case it's in. This problem 
> exists to some extent already with multi-threaded BLAS, so people use 
> various BLAS-specific knobs to manage it in ad hoc ways, but this 
> doesn't scale.

Very good point. I've had both kinds of use cases myself.

It would be nice if there was some way to tell NumPy to either use 
additional threads or not, but that adds complexity. It's also not a 
good solution, considering that any higher-level code building on NumPy, 
if it is designed to be at all reusable, may find *itself* in either 
role. Only the code that, at any particular point of time in the 
development of a software project, happens to form the top level at that 
time, has the required context...

Then again, the matter is further complicated by considering codes that 
run on a single machine, versus codes that run on a cluster. Threads 
being local to each node in a cluster, it may make sense in a solver 
targeted for a cluster to split the problem at the process level, 
distribute the processes across the network, and use the threading 
capability to accelerate computation on each node.

A complex issue with probably no easy solutions :)


  -J




More information about the NumPy-Discussion mailing list