Benefits of asyncio

Mon Jun 2 17:49:37 EDT 2014

On Tue, Jun 3, 2014 at 6:45 AM, Paul Rubin <no.email at nospam.invalid> wrote:
>>     - Thread-safe programming is easy to explain but devilishly
>>       difficult to get right.
>
> I keep hearing that but not encountering it.  Yes there are classic
> hazards from sharing mutable state between threads.  However, it's
> generally not too difficult to program in a style that avoids such
> sharing.  Have threads communicate by message passing with immutable
> data in the messages, and things tend to work pretty straightforwardly.

It's more true on some systems than others. The issues of maintaining
"safe" state are very similar in callback systems and threads; the
main difference is that a single-threaded asyncio system becomes
cooperative, where threading systems are (usually) preemptive.

Preemption means you could get a context switch *anywhere*. (In
Python, I think the rule is that thread switches can happen only
between Python bytecodes, but that's still "anywhere" as far as your
code's concerned.) That means you have to *keep* everything safe,
rather than simply get it safe again.

Cooperative multitasking means your function will run to completion
before any other callback happens (or, at least, will get to a clearly
defined yield point). That means you can muck state up all you like,
and then fix it afterwards. In some ways, that's easier; but it has a
couple of risks: firstly, if your code jumps out early somewhere, you
might forget to fix the shared state, and only find out much later;
and secondly, if your function takes a long time to execute,
everything else stalls.

So whichever way you do it, you still have to be careful - just
careful of slightly different things. For instance, you might keep
track of network activity as a potentially slow operation, and make
sure you never block a callback waiting for a socket - but you might
do a quick and simple system call, not realizing that it involves a
directory that's mounted from a remote server. With threads, someone
else will get priority as soon as you block, but with asyncio, you
have to be explicit about everything that's done asynchronously.

Threads are massively simpler if you have a top-down execution model
for a relatively small number of clients. Works really nicely for a
sequence of prompts - you just code it exactly as if you were using
print() and input() and stuff, and then turn print() into a blocking
socket write (or whatever your I/O is done over) and your input() into
a blocking socket read with line splitting, and that's all the changes
you need. (You could even replace the actual print and input
functions, and use a whole block of code untouched.)

Async I/O is massively simpler if you have very little state, and
simply react to stimuli. Every client connects, authenticates,
executes commands, and terminates its connection. If all you need to
know is whether the client's authenticated or not (restricted
commandset before login), asyncio will be really *really* easy, and
threads are overkill. This is even more true if most of your clients
are going to be massively idle most of the time, with just tiny
queries coming in occasionally and getting responded to quickly.

Both have their advantages and disadvantages. Learning both models is,
IMO, worth doing; get to know them, then decide which one suits your
project.

>>    Asyncio makes the prototype somewhat cumbersome to write. However,
>>    once it is done, adding features, stimuli and states is a routine
>>    matter.
>
> Having dealt with some node.js programs and the nest of callbacks they
> morph into as the application gets more complicated, threads have their
> advantages.

I wrote an uberlite async I/O framework for my last job. Most of the
work was done by the lower-level facilities (actual non-blocking I/O,
etc), but basically, what I had was a single callback for each
connection type and a dictionary of state for each connection (with a
few exceptions - incoming UDP has no state, ergo no dict). Worked out
beautifully simple; each run through the callback processed one
logical action (eg a line of text arriving on a socket, terminated by
newline), updated state if required, and returned, back to the main
loop. Not all asyncio will fit into that sort of structure, but if it
does fit, this keeps everything from getting out of hand.

(Plus, keeping state in a separate dict rather than using closures and
local variables meant I could update code while maintaining state. Not
important for most Python projects, but it was for us.)

Both have their merits.

ChrisA