Are threads bad? - was: Future of Pypy?

Tue Feb 24 03:08:18 EST 2015

Chris Angelico <rosuav at gmail.com>:

> Actually, you can quite happily have multiple threads messing with the
> underlying file descriptors, that's not a problem. (Though you will
> tend to get interleaved output. But if you always produce output in
> single blocks of text that each contain one line with a trailing
> newline, you should see interleaved lines that are each individually
> correct. I'm also not sure of any sane way to multiplex stdin -
> merging output from multiple threads is fine, but dividing input
> between multiple threads is messy.) The problem is *buffers* for stdin
> and stdout, where you have to be absolutely sure that you're not
> trampling all over another thread's data structures. If you unbuffer
> your output, it's probably going to be thread-safe.

Here's an anecdote describing one real-life threading problem. We had a
largish multithreading framework (in Java, but I'm setting it in Python
and in a much simplified form).

We were mindful of deadlocks caused by lock reversal so we had come up
with a policy whereby objects form a layered hierarchy. An object higher
up in the hierarchy was allowed to call methods of objects below while
holding locks. The opposite was not allowed; if an object desired to
call a method of an object above it (through a registered callback), it
had to relinquish all locks before doing so.

However, a situation like this arose:

    class App:
        def send_stream(self, sock):
            with self.lock:
                self.register_socket(sock)

                class SocketWrapper:
                    def read(_, count):
                        return sock.recv(count)
                    def close(_):
                        sock.close()
                        with self.lock:
                            self.unregister_socket(sock)

                self.transport.forward_and_close(SocketWrapper(sock))

    class Transport:
        def forward_and_close(self, readable):
            with self.lock:
                more = readable.read(1000)
                if more is WOULDBLOCK:
                    self.reschedule(readable)
                elif more:
                    ... # out of scope for the anecdote
                else:
                    # EOF reached
                    readable.close()

Now the dreaded lock reversal arises when the App object calls
self.transport.forward_and_close() and Transport calls readable.close()
at the same time.

So why lock categorically like that? Java has a handy "synchronized"
keyword that wraps the whole method in "with self.lock". Ideally, that
handy idiom could be employed methodically. More importantly, to avoid
locking problems, the methodology should be rigorous and mindless. If
the developer must perform a deep locking analysis at every turn, they
are bound to make mistakes, especially when more than one developer is
involved, with differing intuitions.

Unfortunately, that deep locking analysis *is* required at every turn,
and mistakes *are* bound to happen.

Marko