[Chicago] Status of wsgi

Thu Oct 11 23:08:47 CEST 2012

As far as tornado being.. "quite bad ... for REST"...  I guess I'll just
say that I've been paid to write REST services using both tornado and
django, and the tornado systems were not only easier to write, maintain and
scale.

It also happens to win a lot of benchmarks, and "over-simplification" is
another man's lack of bloat.  The underlying code is quite nice, and a
human being can read it.

No.. it is not meant to serve static files, but it's certainly capable
("absolutely miserably"?), and that is a *really* weird criticism to make
of a python web server.

"fast" in the context of tornado actually means.. fast.  Like requests,
natively asynchronous *or* synchronous through WSGI, tend to get served in
fewer milliseconds than most other python frameworks.

I think it's very underrated, and I hate to see people saying bad things
about it.  Maybe it's worth a talk in the next month or two?

On Thu, Oct 11, 2012 at 3:00 PM, Tal Liron <tal.liron at threecrickets.com>wrote:

>  On 10/11/2012 01:59 PM, Jordan Bettis wrote:
>
>  Of course you can have a dynamic worker pool. That's the way apache
> works. Given that python has a fairly "big boned" runtime, there's a
> substantial cost there, as well as doing other things like making DB
> connections for the new workers. And anyway it still only partially
> solves the problem. You're still going to run out of memory or file
> descriptors or something eventually. Compare Apache's behavior in the
> face of a Slow DOS attack compared to that of an asynchronous server
> like nginx.
>
>  My life mission is to dispel this myth (especially because I used to
> believe it myself).
>
> Let's get rid of one myth first: a long time ago, it was the case that
> Linux's single-threaded epoll service was somehow more scalable than more
> simply using threads to read the socket, because thread switching was
> painful. This stopped being true a long time ago: the "fastest" web servers
> (lighttpd) do not use epoll. And, in any case, whether you have a single
> thread *accepting* the connections or not, you'll want a pool of thread
> (or "workers" of some kind) generating content for these connections. I'm
> saying this to point out that there's some confusion as in what counts as
> "async": so let's just get the idea that it has to do with *accepting*the connections out of the way.
>
> In a true "async" server, the server calls you to tells you, look, there's
> this new client connection here (the server maintains a pool of *
> information* -- not threads -- about each client). You can then call the
> server at your convenience when you have data to send to the client, or ask
> it to close the connection. (Again, let's forget how the server actually
> implements this internally; it has nothing to do with asynchronicity in the
> sense we are talking about here.) The quality of an async server has a lot
> to do with what kind of information it keeps.
>
> Think of it for not only in terms of the server but also in terms of your
> application. At some point the server turns things over to your code. So,
> what is your application doing?
>
> For a typical "web" application (REST), each client connection returns an
> entity of some kind. So, you really need to process each client quickly in
> turn. While it's true that NginX or Tornado or Node.js can accept a great
> many connections (it's just a small record of information they keep for
> each, not a thread), if there's no thread (or "worker") ready at your
> application's end to generate an entity, then these connections will queue
> up and your clients will consider your site "down." Async or sync server
> makes no difference: your app is sync because it needs to handle one
> request at a time.
>
> So, when does the asynchronous approach make a real difference? Say your
> application is not typical REST, but instead you are streaming video.
> There's no single entity that the clients are waiting for. So, what you can
> do instead is have each of your threads divide their time between the open
> connections. The more load you have, the less data you want to send per
> client when their turn comes (or you can give paying clients more time per
> turn...) A good async server will provide you with statistics about load to
> help you do the right thing and degrade gracefully. The API approach Garret
> mentioned for WSGi is typical: your app can just return a null or otherwise
> tell the server: "Don't return anything to the client right now; in fact,
> don't you worry about it all, I'll handle the data my own way and close the
> connection." Yes, such an approach enables async, but I wouldn't call it a
> good approach. The architectural burden becomes yours. If you're working
> with Tornado, for example, you're much better off working with its native
> API than using WSGi. Your app won't be portable, but then async rarely is.
>
> There's also a kinda middle ground between these extremes: serving files.
> Text files are usually too small to make a difference, but what if you are
> serving a lot of images? They're big, and sending them to slow clients can
> hold things up if you are using the typical "web" approach of sending them
> everything then need immediately. So, instead you can kinda stream the file
> to them, chunk by chunk, and if you do this well you can degrade
> gracefully. It's async, but with more determinability (because you know the
> size of the files), so it's a use case that has been heavily optimized. For
> example, individual chunks can be cached (mmap files ftw). But this has
> nothing to do with whether the server presents your app with a sync or
> async interface. As I stated, some of the best file servers are synchronous
> servers. They provide only a traditional REST API for your apps, but
> internally they do semi-streaming for files very well.
>
> (And actually there's another myth here: that somehow file servers that
> degrade more gracefully will help you scale. Well, do you ever really want
> any individual server of yours to get to the point where it starts to
> degrade performance in any way, let alone degrade gracefully? These days,
> Google and other search indexes will penalize you for degradation. The
> trick is to scale horizontally with cheap VMs, so you *never* hit that
> point in the graph where things start heading south. You don't care if
> you're heading south fast or slowly. So, at the high scale it makes almost
> no difference if you choose Apache or NginX or lighttpd for your REST apps.
> It will matter only if you're limited to one or two servers in your
> cluster.)
>
> As an opposite example, let's consider Tornado. Yes, it can serve files,
> but it does so absolutely miserably. Its devs make it clear that it was
> never their priority to compete with mature web file servers. Instead, the
> goal was to create a good, straightforward and (to be honest) overly simple
> async server. Tornado is great if you want to write a streaming server
> without bells and whistles. But it's quite bad, partly due to its
> over-simplification, for traditional REST. If you're picking Tornado for
> your web application because it's "async" and "fast" you might not be
> understanding what these terms mean in this context. Find a mature sync
> server and make sure *your* app, on your end, never holds up a thread for
> too long.
>
> Over and out.
>
> -Tl
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20121011/915a9479/attachment.html>