2.6, 3.0, and truly independent intepreters

Wed Oct 29 18:45:17 EDT 2008

If you are dealing with "lots" of data like in video or sound editing,
you would just keep the data in shared memory and send the reference
over IPC to the worker process. Otherwise, if you marshal and send you
are looking at a temporary doubling of the memory footprint of your
app because the data will be copied, and marshaling overhead.

On Fri, Oct 24, 2008 at 3:50 PM, Andy O'Meara <andy55 at gmail.com> wrote:
>
>
>> Are you familiar with the API at all? Multiprocessing was designed to
>> mimic threading in about every way possible, the only restriction on
>> shared data is that it must be serializable, but event then you can
>> override or customize the behavior.
>>
>> Also, inter process communication is done via pipes. It can also be
>> done with messages if you want to tweak the manager(s).
>>
>
> I apologize in advance if I don't understand something correctly, but
> as I understand them, everything has to be serialized in order to go
> through IPC.  So when you're talking about thousands of objects,
> buffers, and/or large OS opaque objects (e.g. memory-resident video
> and images), that seems like a pretty rough hit of run-time resources.
>
> Please don't misunderstand my comments to suggest that multiprocessing
> isn't great stuff.  On the contrary, it's very impressive and it
> singlehandedly catapults python *way* closer to efficient CPU bound
> processing than it ever was before.  All I mean to say is that in the
> case where using a shared address space with a worker pthread per
> spare core to do CPU bound work, it's a really big win not to have to
> serialize stuff.  And in the case of hundreds of megs of data and/or
> thousands of data structure instances, it's a deal breaker to
> serialize and unserialize everything just so that it can be sent
> though IPC.  It's a deal breaker for most performance-centric apps
> because of the unnecessary runtime resource hit and because now all
> those data structures being passed around have to have accompanying
> serialization code written (and maintained) for them.   That's
> actually what I meant when I made the comment that a high level sync
> object in a shared address space is "better" then sending it all
> through IPC (when the data sets are wild and crazy).  From a C/C++
> point of view, I would venture to say that it's always a huge win to
> just stick those "embarrassingly easy" parallelization cases into the
> thread with a sync object than forking and using IPC and having to
> write all the serialization code. And in the case of huge data types--
> such as video or image rendering--it makes me nervous to think of
> serializing it all just so it can go through IPC when it could just be
> passed using a pointer change and a single sync object.
>
> So, if I'm missing something and there's a way so pass data structures
> without serialization, then I'd definitely like to learn more (sorry
> in advance if I missed something there).  When I took a look at
> multiprocessing my concerns where:
>   - serialization (discussed above)
>   - maturity (are we ready to bet the farm that mp is going to work
> properly on the platforms we need it to?)
>
> Again, I'm psyched that multiprocessing appeared in 2.6 and it's a
> huge huge step in getting everyone to unlock the power of python!
> But, then some of the tidbits described above are additional data
> points for you and others to chew on.  I can tell you they're pretty
> important points for any performance-centric software provider (us,
> game developers--from EA to Ambrosia, and A/V production app
> developers like Patrick).
>
> Andy
>
>
>
>
>
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>