Execute in a multiprocessing child dynamic code loaded by the parent process

Chris Angelico rosuav at gmail.com
Sun Mar 6 07:52:51 EST 2022


On Sun, 6 Mar 2022 at 23:43, Martin Di Paola <martinp.dipaola at gmail.com> wrote:
>
> Hi everyone. I implemented time ago a small plugin engine to load code
> dynamically.
>
> So far it worked well but a few days ago an user told me that he wasn't
> able to run in parallel a piece of code in MacOS.
>
> He was using multiprocessing.Process to run the code and in MacOS, the
> default start method for such process is using "spawn". My understanding
> is that Python spawns an independent Python server (the child) which
> receives what to execute (the target function) from the parent process.

> Because Python does not really serialize code but only enough
> information to reload it, the serialization of "objs[0].sayhi" just
> points to its module, "foo".
>

Hmm. This is a route that has some tricky hazards on it. Generally, in
Python code, we can assume that a module is itself, no matter what; it
won't be a perfect clone of itself, it will actually be the same
module.

If you want to support multiprocessing, I would recommend
disconnecting yourself from the concept of loaded modules, and instead
identify the target by its module name.

> I came with a hack: use a trampoline() function to load the plugins
> in the child before executing the target function.
>
> In pseudo code it is:
>
> modules = loader() # load the plugins (Python modules at the end)
> objs = init(modules) # initialize the plugins
>
> def trampoline(target_str):
>     loader() # load the plugins now that we are in the child process
>
>     # deserialize the target and call it
>     target = reduction.loads(target_str)
>     target()
>
> # Serialize the real target function, but call in the child
> # trampoline(). Because it can be accessed by the child it will
> # not fail
> target_str = reduction.dumps(objs[0].sayhi)
> ch = multiprocessing.Process(target=trampoline, args=(target_str,))
> ch.start()
>
> The hack works but is this the correct way to do it?
>

The way you've described it, it's a hack. Allow me to slightly redescribe it.

modules = loader()
objs = init(modules)

def invoke(mod, func):
    # I'm assuming that the loader is smart enough to not load
    # a module that's already loaded. Alternatively, load just the
    # module you need, if that's a possibility.
    loader()
    target = getattr(modules[mod], func)
    target()

ch = multiprocessing.Process(target=invoke, args=("some_module", "sayhi"))
ch.start()


Written like this, it achieves the same goal, but looks a lot less
hacky, and as such, I would say that yes, this absolutely IS a correct
way to do it. (I won't say "the" correct way, as there are other valid
ways, but there's certainly nothing wrong with this idea.)

ChrisA


More information about the Python-list mailing list