Asynchronous processing is more efficient -- surely not?

Chris Angelico rosuav at gmail.com
Wed Apr 4 07:04:11 EDT 2018


On Wed, Apr 4, 2018 at 8:42 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> IMO,
> async has proved useful for handling certain types of IO bound
> workloads with lower overheads[1] than traditional multi-threaded or
> multi-process designs. Whether it's a good fit for any particular
> application is something you'd have to test, as with anything else.

This I would agree with. There are certain types of tasks that really
lend themselves spectacularly well to async I/O models - mainly those
that are fundamentally reactive and can have inordinate numbers of
connected users. Take a chat room, for example. The client establishes
a connection to a server (maybe IRC, maybe a WebSocket, whatever), and
the server just sits there doing nothing until some client sends a
message. Then the server processes the message, does whatever it
thinks is right, and sends messages out to one or more clients. It's
essential that multiple concurrent clients be supported (otherwise
you're chatting with yourself), so how do the different models hold
up?

1) Multiple independent processes. Abysmal; they have to communicate
with each other, so this would require a measure of persistence (eg a
database) and some means of signalling other processes. A pain to
write, and horribly inefficient. You probably would need two threads
per process (one to read, one to write).

2) The multiprocessing module. Better than the above because you can
notify other processes using a convenient API, but you still need an
entire process for every connected client. Probably you'll top out at
a few hundred clients, even if they're all quiet. Still need two
threads per process.

3) Native OS threads using the threading module. Vast improvement; the
coding work would be pretty much the same as for multiprocessing, but
instead of a process, you need a thread. Probably would top out at a
few thousand clients, maybe a few tens of thousands, as long as
they're all fairly quiet. Since all state is now shared, you now need
only one thread per process (its reading thread), and writing is done
in the thread that triggered it. Everything that's global is now
stored in just one place.

4) Asynchronous I/O with an event loop. Small improvement over the
above; now that there's no OS threads involved, you're now limited
only by available memory. You could have tens of millions of connected
clients as long as they're all quiet. Concurrency is now 100% in the
programmer's hands (with explicit yield points), instead of having
automatic yield points any time a thread blocks for any reason; this
restricts parallelism to the points where actual I/O is happening. One
DNS lookup can bring you down, but 100K connected sockets would be no
problem at all.

Async I/O certainly has its place, but the performance advantages
don't really kick in until you're scaling to ridiculous levels of
dormant clients. (If you have a large number of *active* clients, your
actual work is going to be more significant, and you'll need to spend
more time in the CPU, making multiple processes look more attractive.)
Its biggest advantages are in _code simplicity_, not performance. (And
only for people who can't wrap their head around threads; if you're
fluent in threads, the simplicity is comparable, so there's less
advantage.)

ChrisA



More information about the Python-list mailing list