[Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

Eric Snow ericsnowcurrently at gmail.com
Fri Jul 13 20:22:24 EDT 2018


On Sun, Jul 8, 2018 at 12:30 PM David Foster <davidfstr at gmail.com> wrote:
> In the past I have personally viewed Python as difficult to use for
> parallel applications, which need to do multiple things simultaneously
> for increased performance:
>
> * The old Threads, Locks, & Shared State model is inefficient in Python
> due to the GIL, which limits CPU usage to only one thread at a time
> (ignoring certain functions implemented in C, such as I/O).
>
> * The Actor model can be used with some effort via the “multiprocessing”
> module, but it doesn’t seem that streamlined and forces there to be a
> separate OS process per line of execution, which is relatively expensive.

Yep, Python's multi-core story is a bit rough (Jython/IronPython
aside).  It's especially hard for folks used to
concurrency/parallelism in other languages.  I'm hopeful that we can
improve the situation.

> I was thinking it would be nice if there was a better way to implement
> the Actor model, with multiple lines of execution in the same process,

FWIW, at this point I'm a big fan of this concurrency model.  I find
it hurts my brain least. :)

> yet avoiding contention from the GIL. This implies a separate GIL for
> each line of execution (to eliminate contention) and a controlled way to
> exchange data between different lines of execution.
>
> So I was thinking of proposing a design for implementing such a system.
> Or at least get interested parties thinking about such a system.
>
> With some additional research I notice that [PEP 554] (“Multiple
> subinterpeters in the stdlib”) appears to be putting forward a design
> similar to the one I described. I notice however it mentions that
> subinterpreters currently share the GIL, which would seem to make them
> unusable for parallel scenarios due to GIL contention.

I'm glad you found PEP 554.  I wanted to keep the PEP focused on
exposing the existing subinterpreter support (and the basic,
CSP-inspired concurrency model), which is why it doesn't go into much
detail about changes to the CPython runtime that will allow GIL-free
multi-core parallelism.   As Nick mentioned, my talk at the language
summit covers my plans.

Improving Python's multi-core story has been the major focus of my
(sadly relatively small) contributions to CPython for several years
now.  I've made slow progress due to limited time, but things are
picking up, especially since I got a job in December at Microsoft that
allows me to work on CPython for part of each week.  On top of that,
several other people are directly helping now (including Emily
Morehouse) and I got a lot of positive feedback for the project at
PyCon this year.

> I'd like to solicit some feedback on what might be the most efficient
> way to make forward progress on efficient parallelization in Python
> inside the same OS process. The most promising areas appear to be:
>
> 1. Make the current subinterpreter implementation in Python have more
> complete isolation, sharing almost no state between subinterpreters. In
> particular not sharing the GIL. The "Interpreter Isolation" section of
> PEP 554 enumerates areas that are currently shared, some of which
> probably shouldn't be.

Right, this is the approach I'm driving.  At this point I have the
project broken down pretty well into manageable chunks.  You're
welcome to join in. :)  Regardless,  I'd be glad to discuss it with
you in more depth if you're interested.

> 2. Give up on making things work inside the same OS process and rather
> focus on implementing better abstractions on top of the existing
> multiprocessing API so that the actor model is easier to program
> against. For example, providing some notion of Channels to communicate
> between lines of execution, a way to monitor the number of Messages
> waiting in each channel for throughput profiling and diagnostics,
> Supervision, etc. In particular I could do this by using an existing
> library like Pykka or Thespian and extending it where necessary.

It may worth a shot.  You should ask Davin Potts (CC'ed) about this.
We discussed this a little at PyCon.  I'm sure he'd welcome help in
improving the multiprocessing module.

-eric


More information about the Python-ideas mailing list