[Python-Dev] PEP 554 v3 (new interpreters module)

Antoine Pitrou solipsis at pitrou.net
Tue Sep 26 03:04:45 EDT 2017


On Mon, 25 Sep 2017 17:42:02 -0700
Nathaniel Smith <njs at pobox.com> wrote:
> On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> >> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
> >> what the benefit would be.  You can compose either list manually with
> >> a simple comprehension:
> >>
> >>     [interp for interp in interpreters.list_all() if interp.is_running()]
> >>     [interp for interp in interpreters.list_all() if not interp.is_running()]  
> >
> > There is a inherit race condition in doing that, at least if
> > interpreters are running in multiple threads (which I assume is going
> > to be the overly dominant usage model).  That is why I'm proposing all
> > three variants.  
> 
> There's a race condition no matter what the API looks like -- having a
> dedicated running_interpreters() lets you guarantee that the returned
> list describes the set of interpreters that were running at some
> moment in time, but you don't know when that moment was and by the
> time you get the list, it's already out-of-date.

Hmm, you're right of course.

> >> Likewise,
> >> queue.Queue.send() supports blocking, in addition to providing a
> >> put_nowait() method.  
> >
> > queue.Queue.put() never blocks in the usual case (*), which is of an
> > unbounded queue.  Only bounded queues (created with an explicit
> > non-zero max_size parameter) can block in Queue.put().
> >
> > (*) and therefore also never deadlocks :-)  
> 
> Unbounded queues also introduce unbounded latency and memory usage in
> realistic situations.

This doesn't seem to pose much a problem in common use cases, though.
How many Python programs have you seen switch from an unbounded to a
bounded Queue to solve this problem?

Conversely, choosing a buffer size is tricky.  How do you know up front
which amount you need?  Is a fixed buffer size even ok or do you want
it to fluctuate based on the current conditions?

And regardless, my point was that a buffer is desirable.  That send()
may block when the buffer is full doesn't change that it won't block in
the common case.

> There's a reason why sockets
> always have bounded buffers -- it's sometimes painful, but the pain is
> intrinsic to building distributed systems, and unbounded buffers just
> paper over it.

Papering over a problem is sometimes the right answer actually :-)  For
example, most Python programs assume memory is unbounded...

If I'm using a queue or channel to push events to a logging system,
should I really block at every send() call?  Most probably I'd rather
run ahead instead.

> > Also, suddenly an interpreter's ability to exploit CPU time is
> > dependent on another interpreter's ability to consume data in a timely
> > manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> > IMHO it would be better not to have such coupling.  
> 
> A small buffer probably is useful in some cases, yeah -- basically
> enough to smooth out scheduler jitter.

That's not about scheduler jitter, but catering for activities which
occur at inherently different speed or rhythms.  Requiring things run
in lockstep removes a lot of flexibility and makes it harder to exploit
CPU resources fully.

> > I expect more often than expected, in complex systems :-)  For example,
> > you could have a recv() loop that also from time to time send()s some
> > data on another queue, depending on what is received.  But if that
> > send()'s recipient also has the same structure (a recv() loop which
> > send()s from time to time), then it's easy to imagine to two getting in
> > a deadlock.  
> 
> You kind of want to be able to create deadlocks, since the alternative
> is processes that can't coordinate and end up stuck in livelocks or
> with unbounded memory use etc.

I am not advocating we make it *impossible* to create deadlocks; just
saying we should not make them more *likely* than they need to.

> >> I'm not sure I understand your concern here.  Perhaps I used the word
> >> "sharing" too ambiguously?  By "sharing" I mean that the two actors
> >> have read access to something that at least one of them can modify.
> >> If they both only have read-only access then it's effectively the same
> >> as if they are not sharing.  
> >
> > Right.  What I mean is that you *can* share very simple "data" under
> > the form of synchronization primitives.  You may want to synchronize
> > your interpreters even they don't share user-visible memory areas.  The
> > point of synchronization is not only to avoid memory corruption but
> > also to regulate and orchestrate processing amongst multiple workers
> > (for example processes or interpreters).  For example, a semaphore is
> > an easy way to implement "I want no more than N workers to do this
> > thing at the same time" ("this thing" can be something such as disk
> > I/O).  
> 
> It's fairly reasonable to implement a mutex using a CSP-style
> unbuffered channel (send = acquire, receive = release). And the same
> trick turns a channel with a fixed-size buffer into a bounded
> semaphore. It won't be as efficient as a modern specialized mutex
> implementation, of course, but it's workable.

We are drifting away from the point I was trying to make here.  I was
pointing out that the claim that nothing can be shared is a lie.
If it's possible to share a small datum (a synchronized counter aka
semaphore) between processes, certainly there's no technical reason
that should prevent it between interpreters.

By the way, I do think efficiency is a concern here.  Otherwise
subinterpreters don't even have a point (just use multiprocessing).

> Unfortunately while technically you can construct a buffered channel
> out of an unbuffered channel, the construction's pretty unreasonable
> (it needs two dedicated threads per channel).

And the reverse is quite cumbersome as well.  So we should favour the
construct that's more convenient for users, or provide both.

Regards

Antoine.


More information about the Python-Dev mailing list