[SciPy-user] shared memory machines

Robert Kern robert.kern at gmail.com
Mon Feb 2 12:29:06 EST 2009


On Mon, Feb 2, 2009 at 04:53, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> On Mon, Feb 02, 2009 at 12:51:51AM -0600, Robert Kern wrote:
>> On Mon, Feb 2, 2009 at 00:38, Gael Varoquaux
>> <gael.varoquaux at normalesup.org> wrote:
>> > I think I should write empty_shmem, to complete hide the multiprocessing
>> > Array, delete my useless SharedMemArray class, integrate your number of
>> > processor function, and recirculate my code, if it is OK with you. In a
>> > few iterations we can propose this for integration in numpy.
>
>> Here's mine, FWIW. It goes down directly to the multiprocessing.heap
>> code underlying the Array stuff. On Windows, the objects transfer via
>> pickle while under UNIX, they must be inherited. Windows mmap objects
>> can be pickled while UNIX mmap objects can't. Like Sturla says, we'd
>> have to use named shared memory to get around this on UNIX.
>
> Well, you know way more than I do about this. But I fear I am
> miss-understanding something. Does what you are saying means that an
> 'empty_shmem', that would create a multiprocessing Array, and expose it
> as a numpy array, is bound to fail under windows?

[These first two paragraphs are basically what Sturla says in his
response. He's faster on the Send button than I am.  :-)]

Almost. On Windows, the subprocesses inherit nothing. All objects must
be passed through pickles. Passing the Array works, but passing the
ndarray won't because the ndarray pickler will pass-by-value. My
approach registers a new pickler for ndarrays that recognizes my
shared-memory ndarrays and makes a pickle that just references the
shared memory. You could replicate that using Array as the memory
allocator, but I think my approach which uses the "raw" allocators
underneath Array is more straightforward.

On UNIX, Arrays and the stuff underneath it don't pickle because the
underlying mmap is not named. We'd need to wrap the appropriate APIs
in order to do this. If you can arrange your program such that the
arrays get inherited, you're fine because you don't need to pickle
anything, but you can't pass these ndarrays through Queues and such.

I've tried using the shm module, which does wrap those APIs, but I've
never been able to get the memory to actually share unless if the
subprocess inherits it.

  http://nikitathespider.com/python/shm/

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the SciPy-User mailing list