[Python-Dev] Forking and Multithreading - enemy brothers

Ben Walker jaedan31 at gmail.com
Thu Feb 4 17:58:56 CET 2010


Pascal Chambon writes:
> I don't really get it there... it seems to me that multiprocessing only
> requires picklability for the objects it needs to transfer, i.e those
> given as arguments to the called function, and thsoe put into
> multiprocessing queues/pipes. Global program data needn't be picklable -
> on windows it gets wholly recreated by the child process, from python
> bytecode.
>
> So if you're having pickle errors, it must be because the
> "object_from_module_xyz" itself is *not* picklable, maybe because it
> contains references to unpicklable objets. In such case, properly
> implementing pickle magic methods inside the object should do it,
> shouldn't it ?

I'm also a long time lurker (and in financial software, coincidentally).
Pascal is correct here. We use a number of C++ libraries wrapped via
Boost.Python to do various calculations. The typical function calls return
wrapped C++ types. Boost.Python types are not, unfortunately, pickleable.
For a number of technical reasons, and also unfortunately, we often have to
load these libraries in their own process, but we want to hide this from
our users. We accomplish this by pickling the instance data, but importing
the types fresh when we unpickle, all implemented in the magic pickle
methods. We would lose any information that was dynamically added to
the type in the remote process, but we simply don't do that.  We very often
have many unpickleable objects imported somewhere when we spin off our
processes using the multiprocess library, and this does not cause any
problems.

Jesse Noller <jnoller <at> gmail.com> writes:
> We already have an implementation that spawns a
> subprocess and then pushes the required state to the child. The
> fundamental need for things to be pickleable *all the time* kinda
> makes it annoying to work with.

This requirement puts a fairly large additional strain on working with
unwieldy, wrapped C++ libraries in a multiprocessing environment.
I'm not very knowledgeable on the internals of the system, but would
it be possible to have some kind of fallback system whereby if an object
fails to pickle we instead send information about how to import it? This
has all kinds of limitations - it only works for importable things (i.e. not
instances), it can potentially lose information dynamically added to the
object, etc., but I thought I would throw the idea out there so someone
knowledgeable can decide if it has any merit.

Ben


More information about the Python-Dev mailing list