Asynchronous programming

Paul Rubin no.email at nospam.invalid
Thu Aug 11 13:50:10 EDT 2016


Steven D'Aprano <steve+python at pearwood.info> writes:
> But what's the point in doing it asynchronously if I have to just wait for
> it to complete?
>   begin downloading in an async thread
>   twiddle thumbs, doing nothing
>   process download

Suppose the remote server is overloaded so it sends files much slower
than your internet connection can receive them.  And you want to
download 10 different files from 10 different such servers.  If you do
them synchronously (wait for each download to finish before starting the
next) it takes much longer than necessary.  What you want is to download
the 10 files simultaneously, using 10 different network connections.
The download procedure has to speak some network protocol over a socket
for each file.

How do you deal with the concurrency?  There are many possibilities:

   - use 10 different computers
   - open 10 windows on 1 computer, and start a download in each window
     (i.e. this means use multiple processes).
   - single, multi-threaded download client (oh nooo!  the thread
     monsters will eat you if you try to do that!!!!)
   - single threaded client with asynchronous i/o, so it can have
     10 network requests "in the air" simultaneously, using select() or
     epoll() to handle each piece of incoming data as soon as it arrives.

The multi-thread and multi-process approaches are conceptually simple
since each connection appears to be synchronous and blocking.  They are
both actually async under the covers, but the async i/o and dispatch is
abstracted away by the OS, so the user program doesn't have to worry
about it.

The in-client async approach is generally the most efficient (OS
processes and threads are expensive), but imposes complexity on the
client protocol by making it juggle what each connection is doing,
i.e. where it is in the network protocol at any moment.

A lot of ways have developed over the years to organize client-side
async programs and keep them from getting too confusing:

  - explicit state machines (a struct with a state tag for each
    connection, and a big event loop with a switch statement),
    frequently seen in C programs
  - Chained callbacks ("callback hell") seen in node.js
  - Callbacks on objects ("reactor pattern"), used in Twisted Matrix
  - explicit cooperative multitasking (used in RTOS's, classic Forth,
    etc.)
  - lightweight processes or threads handled by the language runtime
    (GHC, Erlang).  This means the user program thinks it's doing
    blocking i/o but it's really not.
  - coroutines (Lua and now Python's asyncio)
  - continuation-based hackery (various Haskell enumeratee libraries)
  - probably more that I don't know about or am forgetting.

I like the Erlang/GHC approach best, but it basically means building a
miniature OS into the language runtime and making sure all the
user-visible i/o calls actually use this "OS" instead of actual system
calls to the underlying kernel.  The Erlang and GHC implementations are
quite complicated while Python is basically a fairly simple interpreter
wrapped around the standard C libraries.

In Python, the async discussion is basically between various forms of
callbacks, and a few different forms of coroutines.  I think that the
conception of best practices is still not completely settled.

I ignore all this and use threads and take the performance hit.  I find
it simpler and I haven't been eaten by any thread monsters yet (though
there's always a first time).  I figure if I need high performance,
Python isn't the way to do it in the first place: Python is more about
convenience and productivity than performance.  I've had 1000 or so
Python threads on a midsized EC2 instance and it's worked ok.

If you really want to do crazy fast async i/o and you use C++, check out
  http://www.seastar-project.org/

I haven't tried it yet but want to.

Here's a cool paper about the current GHC I/O system:
http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf



More information about the Python-list mailing list