[Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

Fri Apr 11 19:07:02 EDT 2014

On 12/04/14 00:39, Nathaniel Smith wrote:

> The spawn mode is fine and all, but (a) the presence of something in
> 3.4 helps only a minority of users, (b) "spawn" is not a full
> replacement for fork;

It basically does the same as on Windows. If you want portability to 
Windows, you must abide by these restrictions anyway.

> with large read-mostly data sets it can be a
> *huge* win to load them into the parent process and then let them be
> COW-inherited by forked children.

The thing is that Python reference counts breaks COW fork. This has been 
discussed several times on the Python-dev list. What happens is that as 
soon as the child process updates a refcount, the OS copies the page. 
And because of how Python behaves, this copying of COW-marked pages 
quickly gets excessive. Effectively the performance of os.fork in Python 
will close to a non-COW fork. A suggested solution is to move the 
refcount out of the PyObject struct, and perhaps keep them in a 
dedicated heap. But doing so will be unfriendly to cache.

> ATM the only other way to work with
> a data set that's larger than memory-divided-by-numcpus is to
> explicitly set up shared memory, and this is *really* hard for
> anything more complicated than a single flat array.

Not difficult. You just go to my GitHub site and grab the code ;)

(I have some problems running it on my MBP though, not sure why, but it 
used to work on Linux and Windows, and possibly still does.)

https://github.com/sturlamolden/sharedmem-numpy

Sturla