Benefits of asyncio

Tue Jun 3 05:30:14 EDT 2014

On Tue, Jun 3, 2014 at 7:10 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Chris Angelico <rosuav at gmail.com>:
>
>> def request.process(self): # I know this isn't valid syntax
>>     db.act(whatever) # may block but shouldn't for long
>>     db.commit() # ditto
>>     write(self, response) # won't block
>>
>> This works as long as your database is reasonably fast and close
>
> I find that assumption unacceptable.

It is a dangerous assumption.

> The DB APIs desperately need asynchronous variants. As it stands, you
> are forced to delegate your DB access to threads/processes.
>
>> So how do you deal with the possibility that the database will block?
>
> You separate the request and response parts of the DB methods. That's
> how it is implemented internally anyway.
>
> Say no to blocking APIs.

Okay, but how do you handle two simultaneous requests going through
the processing that you see above? You *MUST* separate them onto two
transactions, otherwise one will commit half of the other's work. (Or
are you forgetting Databasing 101 - a transaction should be a logical
unit of work?) And since you can't, with most databases, have two
transactions on one connection, that means you need a separate
connection for each request. Given that the advantages of asyncio
include the ability to scale to arbitrary numbers of connections, it's
not really a good idea to then say "oh but you need that many
concurrent database connections". Most systems can probably handle a
few thousand threads without a problem, but a few million is going to
cause major issues; but most databases start getting inefficient at a
few thousand concurrent sessions.

>> but otherwise, you would need to completely rewrite the main code.
>
> That's a good reason to avoid threads. Once you realize you would have
> been better off with an async approach, you'll have to start over. You
> can easily turn a nonblocking solution into a blocking one but not the
> other way around.

Alright. I'm throwing down the gauntlet. Write me a purely nonblocking
web site concept that can handle a million concurrent connections,
where each one requires one query against the database, and one in a
hundred of them require five queries which happen atomically. I can do
it with a thread pool and blocking database queries, and by matching
the thread pool size and the database concurrent connection limit, I
can manage memory usage fairly easily; how do you do it efficiently
with pure async I/O?

ChrisA