[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Sat Sep 29 08:23:49 EDT 2018

On Fri, Sep 28, 2018 at 9:27 PM Michael Selik <mike at selik.org> wrote:

> On Fri, Sep 28, 2018 at 2:11 PM Sean Harrington <seanharr11 at gmail.com>
> wrote:
> > kwarg on Pool.__init__ called `expect_initret`, that defaults to False.
> When set to True:
> > Capture the return value of the initializer kwarg of Pool
> > Pass this value to the function being applied, as a kwarg.
>
> The parameter name you chose, "initret" is awkward, because nowhere
> else in Python does an initializer return a value. Initializers mutate
> an encapsulated scope. For a class __init__, that scope is an
> instance's attributes. For a subprocess managed by Pool, that
> encapsulated scope is its "globals". I'm using quotes to emphasize
> that these "globals" aren't shared.
>

>> Yes - if you bucket the "initializer" arg of Pool into the "Python
initializers" then I see your point here. And yes initializer mutates the
global scope of the worker subprocess. Again, my gripe is not with globals.
I am looking for the ability to have a clear, explicit flow of data from
parent -> child process, without being constrained to using globals.

>
> On Fri, Sep 28, 2018 at 4:39 PM Sean Harrington <seanharr11 at gmail.com>
> wrote:
> > On Fri, Sep 28, 2018 at 6:45 PM Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> >> 3. If you don't like globals, you could probably do something like
> >> lazily-initialize the resource when a function needing it is executed
> >
> > if initializing the resource is expensive, we only want to do this ONE
> time per worker process.
>
> We must have a different concept of "lazily-initialize". I understood
> Antoine's suggestion to be a one-time initialize per worker process.
>

>> See my response to Anotoine earlier. I missed the point made. This is a
valid solution to the problem of "initializing objects after a worker has
been forked", but fails to address the "create big object in parent, pass
to each worker".

>
> On Fri, Sep 28, 2018 at 4:39 PM Sean Harrington <seanharr11 at gmail.com>
> wrote:
> > My simple argument is that the developer should not be constrained to
> make the objects passed globally available in the process, as this MAY
> break encapsulation for large projects.
>
> I could imagine someone switching from Pool to ThreadPool and getting
> into trouble, but in my mind using threads is caveat emptor. Are you
> worried about breaking encapsulation in a different scenario?
>

>> Without a specific example on-hand, you could imagine a tree of function
calls that occur in the worker process (even newly created objects), that
should not necessarily have access to objects passed from parent -> worker.
In every case given the current implementation, they will.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180929/79ab1214/attachment.html>