[issue23979] Multiprocessing Pool.map pickles arguments passed to workers

Josh Rosenberg report at bugs.python.org
Fri Apr 17 02:02:29 CEST 2015


Josh Rosenberg added the comment:

The nature of a Pool precludes assumptions about the availability of specific objects in a forked worker process (particularly now that there are alternate methods of forking processes). Since the workers are spun up when the pool is created, objects created or modified after that point would have to be serialized by some mechanism anyway.

The Pool class doesn't describe this explicitly, there are multiple references to this behavior (e.g. https://docs.python.org/3/library/multiprocessing.html#all-start-methods mentions inheriting as being more efficient than pickling/unpickling; you have to develop with inheritance in mind though; pickling, particular for task dispatch approaches in the "Futures" model, can't be generalized as an inheritance problem when using producer/consumer based worker model).

Point is, this is an expected behavior. You need some means of transferring objects between processes, and pickling is the Python standard serialization method. The inability to serialize a 4+ GB bytes object is a problem I assume (don't know if a bug exists for that), but pickling as the mechanism is the only obvious way to do it. If you want to avoid inheritance, it's up to you to ensure the root process has created the necessary bytes object prior to creating the Pool, and conveying information to the worker about how to find it (say, a dict of int keys to your bytes object data) in its own memory.

----------
nosy: +josh.r

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23979>
_______________________________________


More information about the Python-bugs-list mailing list