[Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.
Trent Nelson
trent at trent.me
Tue Jul 17 19:24:10 EDT 2018
(Apologies for the slow reply, I'm in the middle of a relocation
at the moment so e-mail access isn't consistent, and will be a
lot worse over the next few weeks.)
On Tue, Jul 10, 2018 at 07:31:49AM -0700, David Foster wrote:
> I was not aware of PyParallel. The PyParellel "parallel thread"
> line-of-execution implementation is pretty interesting. Trent, big
> kudos to you on that effort.
>
> Since you're speaking in the past tense and said "but we're not
> doing it like that", I infer that the notion of a parallel thread
> was turned down for integration into CPython, as that appears to
> have been the original goal.
>
> However I am unable to locate a rationale for why that integration
> was turned down. Was it deemed to be too complex to execute, perhaps
> in the context of providing C extension compatibility? Was there a
> desire to see a similar implementation on Linux as well as Windows?
> Some other reason? Since I presume you were directly involved in the
> discussions, perhaps you have a link to the relevant thread handy?
>
> The last update I see from you RE PyParallel on this list is:
> https://mail.python.org/pipermail/python-ideas/2015-September/035725.html
PyParallel was... ambitious to say the least. When I started it,
I sort of *hand wavy* envisioned it would lead to something that
I could formally pitch to python-dev at . But there was a lot of
blissful ignorance of the ensuing complexity in that initial
sentiment, though.
So, nothing was formally turned down by core developers, as I
never really ended up pitching something formal that could be
assessed for inclusion. By the time I'd developed something
that was at least an alpha-level proof-of-concept, I had to
make 50+ pretty sizable implementation decisions that would
have warranted their own PEP if the work ever made it into
the mainline Python.
I definitely think a PyParallel-esque approach (where we play
it fast and loose with what's considered the GIL, how and when
reference counting is done, etc.) is the only viable *performant*
option we have for solving the problem -- i.e. I can't see how
a "remove the GIL, introduce fine grained locking, use interlocked
ops for ref counts"-type conventional approach will ever yield
acceptable performance.
But, yeah, I'm not optimistic we'll see a solution actually in
the mainline Python any time soon. I logged about 2500 hours
of development time hacking PyParallel into it's initial alpha
proof-of-concept state. It only worked on one operating system,
required intimate knowledge of Python innards (which I lacked at
the start), and exposed a very brittle socket-server oriented
interface to leverage the parallelism (there was no parallel
compute/free-threading type support provided, really).
I can't think of how we'll arrive at something production quality
without it being a multi-year, many-developer (full time, ideally
located in proximity to each other) project. I think you'd really
need a BDFL Guido/Linus/Cutler-type lead driving the whole effort
too, as there will be a lot of tough, dividing decisions that need
to be made.
How would that be funded?! It's almost a bit of a moon-shot type
project. Definitely high-risk. There's no precedent for the PSF
funding such projects, nor large corporate entities (i.e. Google,
Amazon, Microsoft). What's the ROI for those companies to take on
so much cost and risk? Perhaps if the end solution only ran on
their cloud infrastructure (Azure, AWS, GCS) -- maybe at least
initially. That... that would be an interesting turn of events.
Maybe we just wait 20 years 'til a NumPy/SciPy/Z3-stack does some
cloud AI stuff to "solve" which parts of an existing program can
be executed in parallel without any user/developer assistance :-)
> David Foster | Seattle, WA, USA
Regards,
Trent.
--
https://trent.me
More information about the Python-ideas
mailing list