Future of Pypy?

Sun Feb 22 21:04:09 EST 2015

Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> I'm sorry, but the instant somebody says "eliminate the GIL", they lose
> credibility with me. Yes yes, I know that in *your* specific case you've
> done your research and (1) multi-threaded code is the best solution for
> your application and (2) alternatives aren't suitable.

I don't see what the big deal is.  I hear tons of horror stories about
threads and I believe them, but the thing is, they almost always revolve
around acquiring and releasing locks in the wrong order, forgetting to
lock things, stuff like that.  So I've generally respected the terror
and avoided programming in that style, staying with a message passing
style that may take an efficiency hit but seems to avoid almost all
those problems.  TM also helps with lock hazards and it's a beautiful
idea--I just haven't had to use it yet.  The Python IRC channel seems to
rage against threads and promote Twisted for concurrency, but Twisted
has always reminded me of Java.  I use threads in Python all the time
and haven't gotten bitten yet.

> Writing multithreaded code is *hard*. It is not a programming model
> which comes naturally to most human beings. Very few programs are
> inherently parallelizable, although many programs have *parts* which
> can be successfully parallelized.

Parallel algorithms are complicated and specialized but tons of problems
amount to "do the same thing with N different pieces of data", so-called
embarassingly parallel.  The model is you have a bunch of worker threads
reading off a queue and processing the items concurrently.  Sometimes
separate processes works equally well, other times it's good to have
some shared data in memory instead of communicating through sockets.  If
the data is mutable then have one thread own it and access it only with
message passing, Erlang style.  If it's immutable after initialization
(protect it with a condition variable til initialization finishes) then
you can have read-only access from anywhere.

> if you're running single-thread code on a single-core machine and
> still complaining about the GIL, you have no clue.

Even the Raspberry Pi has 4 cores now, and fancy smartphones have had
them for years.  Single core cpu's are specialized and/or historical.

> for some programs, the simplest way to speed it up is to vectorize the
> core parts of your code by using numpy.  No threads needed.

Nice for numerical codes, not so hot for anything else.

> Where are the people flocking to use Jython and IronPython?

Shrug, who knows, those implementations were pretty deficient from what
heard.

> For removal of the GIL to really make a difference: ...
> - you must be performing a task which is parallelizable and not inherently
> sequential (no point using multiple threads if each thread spends all its
> time waiting for the previous thread);

That's most things involving concurrency these days.

> - the task must be one that moving to some other multi-processing model
> (such as greenlets, multiprocess, etc.) is infeasible;

I don't understand this--there can be multiple ways to solve a problem.

> - your threading bottleneck must be primarily CPU-bound, not I/O bound
> (CPython's threads are already very effective at parallelising I/O tasks);

If your concurrent program's workload makes it cpu-bound even 1% of the
time, then you gain something by having it use your extra cores at those
moments, instead of having those cores always do nothing.

> - and you must be using libraries and tools which prevent you moving to
> Jython or IronPython or some other alternative.

I don't get this at all.  Why should I not want Python to have the same
capabilities?