Execute in a multiprocessing child dynamic code loaded by the parent process

Martin Di Paola martinp.dipaola at gmail.com
Sun Mar 6 07:42:08 EST 2022


Hi everyone. I implemented time ago a small plugin engine to load code
dynamically.

So far it worked well but a few days ago an user told me that he wasn't
able to run in parallel a piece of code in MacOS.

He was using multiprocessing.Process to run the code and in MacOS, the
default start method for such process is using "spawn". My understanding
is that Python spawns an independent Python server (the child) which
receives what to execute (the target function) from the parent process.

In pseudo code this would be like:

modules = loader() # load the plugins (Python modules at the end)
objs = init(modules) # initialize the plugins

# One of the plugins wants to execute part of its code in parallel
# In MacOS this fails
ch = multiprocessing.Process(target=objs[0].sayhi)
ch.start()

The code fails with "ModuleNotFoundError: No module named 'foo'" (where
'foo' is the name of the loaded plugin).

This is because the parent program sends to the serve (the child) what
needs to execute (objs[0].sayhi) using pickle as the serialization
mechanism.

Because Python does not really serialize code but only enough
information to reload it, the serialization of "objs[0].sayhi" just
points to its module, "foo".

Module which it cannot be imported by the child process.

So the question is, what would be the alternatives and workarounds?

I came with a hack: use a trampoline() function to load the plugins
in the child before executing the target function.

In pseudo code it is:

modules = loader() # load the plugins (Python modules at the end)
objs = init(modules) # initialize the plugins

def trampoline(target_str):
    loader() # load the plugins now that we are in the child process

    # deserialize the target and call it
    target = reduction.loads(target_str)
    target()

# Serialize the real target function, but call in the child
# trampoline(). Because it can be accessed by the child it will
# not fail
target_str = reduction.dumps(objs[0].sayhi)
ch = multiprocessing.Process(target=trampoline, args=(target_str,))
ch.start()

The hack works but is this the correct way to do it?

The following gist has the minimal example code that triggers the issue
and its workaround:
https://gist.github.com/eldipa/d9b02875a13537e72fbce4cdb8e3f282

Thanks!
Martin.


More information about the Python-list mailing list