Asynchronous processing is more efficient -- surely not?

Wed Apr 4 07:21:02 EDT 2018

On Wed, Apr 4, 2018 at 9:02 PM, Richard Damon <Richard at damon-family.org> wrote:
> Asynchronous processing will use a bit more of some processing resources
> to handle the multi-processing, but it can be more efficient at fully
> using many of the resources that are available.
>
> Take your file download example. When you are downloading a file, your
> processor sends out a request for a chuck of the data, then waits for
> the response. The computer on the other end gets that requests, sends a
> packet, and then waits. You get that packet, send and acknowledgement,
> and then wait, the other computer gets that acknowledgement and sends
> more data, and then waits, and so on. Even if your pipe to the Internet
> is the limiting factor, there is a fair amount of dead time in this
> operation, so starting another download to fill more of that pipe can
> decrease the total time to get all the data downloaded.

Assuming that you're downloading this via a TCP/IP socket (eg from a
web server), the acknowledgements are going to be handled by the OS
kernel, not your process. Plus, TCP allows acknowledgements to stack,
so you're not really waiting for each other's acks very much. A single
socket is entirely capable of saturating one computer's uplink. I once
proved to my employer that a particular host had gigabit internet by
renting a dozen EC2 instances with 100Mbit uplinks and having each of
them transfer data to the same host concurrently - via one socket
connection each.

Much more interesting is operating a high-bandwidth server (let's say,
a web application) that is responding to requests from myriad
low-bandwidth clients. Traditional servers such as Apache's prefork
mode would follow a model like this:

while "more sockets":
    newsock = mainsock.accept()
    fork_to_subprocess(handler, newsock)

def handler(sock):
    while "need headers":
        sock.receive_headers()
    while "need body":
        sock.receive_body()
    generate_response()
    while "response still sending":
        sock.send_response()

A threaded model does the same thing, but instead of forking a
subprocess, it spawns a thread. The handler is pretty simple and
straight-forward; it reads from the socket until it has everything it
needs, then it sends off a response. Both reading and writing can and
will block, and generating the response is the only part that's really
CPU-bound.

"Parallelism" here means two things: how many active clients can you
support (throughput), and how many dormant clients can you support
(saturation). In a forked model, you spend a lot of resources spinning
up processes (you can reduce this with process pools and such, at the
expense of code complexity and a slower spin-down when idle); in a
threaded model, you spend far less, but you're still paying a
significant price per connection, and saturation can be a problem. The
beauty of async I/O is that saturation becomes almost completely
insignificant; the cost is that throughput is capped at a single
thread's capabilities.

In theory, you could use async I/O with multiple threads pumping the
same set of events. I'm not sure if anyone has ever actually done
this, as it combines the complexities of both models, but it would
maximize both throughput and saturation levels - dormant clients cost
very little, and you're able to use multiple CPU cores. More commonly,
you could run a thread pool, doling out clients to whichever thread is
least busy, and then having each thread run an independent event loop,
which would be fine in the average case. But that's still more
complicated; you still have to think about threads.

> Yes, if your single path of execution can fully use the critical
> resources, then adding asynchronous processing won't help, but rarely
> does it. Very few large machines today a single-threaded, but most have
> multiple cores and often even those cores have the ability to handle
> multiple threads at once. Thus there normally are extra resources that
> the asynchronous processing can better use up, so even processor usage
> can be improved in many cases.

I'm not sure what tasks would allow you to reduce processor usage this
way. Got an example?

ChrisA