Are threads bad? - was: Future of Pypy?

Mon Feb 23 19:35:31 EST 2015

On Tue Feb 24 2015 at 10:15:40 AM Paul Rubin <no.email at nospam.invalid>
wrote:
>
> I don't see the error-proneness since nothing there seems to set off
> mutation of shared data.
>

I'm not sure what else to say really. It's just a fact of life that Threads
by definition run in the same memory space and hence always have the
possibility of nasty unforeseen problems. They are unforeseen because it is
extremely difficult (maybe impossible?) to try and map out and understand
all the different possible mutations to state. Sure, your code might not be
making any mutations (that you know of), but malloc definitely is [1], and
that's just the tip of the iceberg. Other things like buffers for stdin and
stdout, DNS resolution etc. all have the same issue.

I have no doubt someone can come up with a scenario where they need to use
threads. I can't come up with one myself, but maybe someone else can. But
in the work I have done, processes have sufficed - even for the example of
dynamic callables you gave.

To borrow from the original article I linked - "Nevertheless I still think
it’s a bad idea to make things harder for ourselves if we can avoid it."

Cheers

[1] Line 70 of glibc malloc -
https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/arena.c;h=8af51f05eb376ae2ba07e99c8c766a8ae8af425b;hb=bdf1ff052a8e23d637f2c838fa5642d78fcedc33#l70

>
> > Also, the majority can be achieved via the process approach. For
> > example, using fork to take a copy of the current process (including
> > the heap) you want to use will give you access to any callables on the
> > heap.
>
> What if you want to dynamically construct a callable and send it to
> another process?
>
> > Even if you are extra careful to not touch any shared state in your
> > code, you can almost be guaranteed that code higher up the stack, like
> > malloc for example, *will* be using shared state.
>
> This isn't the 1980's any more--any serious malloc implementation these
> days is thread safe.  People write multi-threaded C programs all the
> time and those programs use malloc in more than one thread.
>
> > Even if you aren't sharing state in your code directly, code higher up
> > the stack will be sharing state.  That is the whole point of a thread,
> > that's what they were invented for.  Using threads safely might well
> > be impossible much less verifiable.
>
> You're basically saying it's impossible to write a reliable operating
> system, since OS's by nature have to do that stuff.  Of course there are
> verified OS's, and some of the early pioneers in concurrency were the
> same guys who worked in program verification, e.g. Dijkstra's
> semaphores.  Even Erlang uses data sharing under the hood (ETS tables
> and large binaries) though their API makes it look like the data is
> copied between processes.
>
> What I'd say is that multi-threaded programs tend to have miniature OS's
> inside them, so it helps to have had some exposure to OS implementation
> techniques if you're going to write this kind of code.  But if you've
> had that exposure then it all becomes less scary.
>
> > So when there are other options that are just as viable/functional,
> > result in far less risk and are often much quicker to implement
> > correctly, why wouldn't you use them?
>
> I should give the multiprocessing module a try sometime (haven't used it
> so far because it's relatively new and I'm comfortable with threads).
> It has the disadvantages that I noted, though.
>
> > If it were easy to use threads in a verifiably safe manner, then there
> > probably wouldn't be a GIL.
>
> Nah, the GIL is just a CPython artifact.  As Steven says, IronPython and
> Jython don't have GIL's.  Java has no GIL, OCaml has no GIL, GHC has no
> GIL, etc.  Someone made a CPython version with no GIL some years ago and
> it worked fine and it got a speedup on multiple cores.  The only problem
> was that on a single core, it was significantly slower than regular
> CPython, specifically because of the overhead of having to lock all the
> refcount updates, so it was considered a failure.  Laura Creighton may
> have more to say about this, but I've been under the impression that the
> main obstacle to getting rid of the CPython GIL is the refcount system
> (which is also easy to make mistakes with, by the way).  That's why I
> was surprised to hear that PyPy has a GIL.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150224/d1af3442/attachment.html>