Future of Pypy?

Mon Feb 23 19:11:37 EST 2015

Ryan Stuart <ryan.stuart.85 at gmail.com> writes:
>     Threads can also share read-only data and you can pass arbitrary
>     objects (such as code callables that you want the other thread to
>     execute--this is quite useful) through Queue.Queue. I don't think
>     you can do that with the multiprocessing module.
>
> These things might be convenient but they are error prone for the
> reasons pointed out.

I don't see the error-proneness since nothing there seems to set off
mutation of shared data.

> Also, the majority can be achieved via the process approach. For
> example, using fork to take a copy of the current process (including
> the heap) you want to use will give you access to any callables on the
> heap.

What if you want to dynamically construct a callable and send it to
another process?

> Even if you are extra careful to not touch any shared state in your
> code, you can almost be guaranteed that code higher up the stack, like
> malloc for example, *will* be using shared state.

This isn't the 1980's any more--any serious malloc implementation these
days is thread safe.  People write multi-threaded C programs all the
time and those programs use malloc in more than one thread.

> Even if you aren't sharing state in your code directly, code higher up
> the stack will be sharing state.  That is the whole point of a thread,
> that's what they were invented for.  Using threads safely might well
> be impossible much less verifiable.

You're basically saying it's impossible to write a reliable operating
system, since OS's by nature have to do that stuff.  Of course there are
verified OS's, and some of the early pioneers in concurrency were the
same guys who worked in program verification, e.g. Dijkstra's
semaphores.  Even Erlang uses data sharing under the hood (ETS tables
and large binaries) though their API makes it look like the data is
copied between processes.

What I'd say is that multi-threaded programs tend to have miniature OS's
inside them, so it helps to have had some exposure to OS implementation
techniques if you're going to write this kind of code.  But if you've
had that exposure then it all becomes less scary.

> So when there are other options that are just as viable/functional,
> result in far less risk and are often much quicker to implement
> correctly, why wouldn't you use them?

I should give the multiprocessing module a try sometime (haven't used it
so far because it's relatively new and I'm comfortable with threads).
It has the disadvantages that I noted, though.

> If it were easy to use threads in a verifiably safe manner, then there
> probably wouldn't be a GIL.

Nah, the GIL is just a CPython artifact.  As Steven says, IronPython and
Jython don't have GIL's.  Java has no GIL, OCaml has no GIL, GHC has no
GIL, etc.  Someone made a CPython version with no GIL some years ago and
it worked fine and it got a speedup on multiple cores.  The only problem
was that on a single core, it was significantly slower than regular
CPython, specifically because of the overhead of having to lock all the
refcount updates, so it was considered a failure.  Laura Creighton may
have more to say about this, but I've been under the impression that the
main obstacle to getting rid of the CPython GIL is the refcount system
(which is also easy to make mistakes with, by the way).  That's why I
was surprised to hear that PyPy has a GIL.