tornado.web ioloop add_timeout eats CPU
Laszlo Nagy
gandalf at shopzeus.com
Tue Sep 4 03:30:07 EDT 2012
> What's wrong is the 1,135,775 calls to "method 'poll' of
> 'select.epoll' objects".
I was affraid you are going to say that. :-)
> With five browsers waiting for messages over 845 seconds, that works
> out to each waiting browser inducing 269 epolls per second.
>
> Almost equally important is what the problem is *not*. The problem is
> *not* spending the vast majority of time in epoll; that's *good* news.
> The problem is *not* that CPU load goes up linearly as we connect more
> clients. This is an efficiency problem, not a scaling problem.
>
> So what's the fix? I'm not a Tornado user; I don't have a patch.
> Obviously Laszlo's polling strategy is not performing, and the
> solution is to adopt the event-driven approach that epoll and Tornado
> do well.
Actually, I have found a way to overcome this problem, and it seems to
be working. Instead of calling add_timeout from every request, I save
the request objects in a list, and operate a "message distributor"
service in the background that routes messages to clients, and finish
their long poll requests when needed. The main point is that the
"message distributor" has a single entry point, and it is called back at
given intervals. So the number of callbacks per second does not increase
with the number of clients. Now the CPU load is about 1% with one
client, and it is the same with 15 clients. While the response time is
the same (50-100msec). It is efficient enough for me.
I understand that most people do a different approach: they do a fast
poll request from the browser in every 2 seconds or so. But this is not
good for me, because then it can take 2 seconds to send a message from
one browser into another that is not acceptable in my case. Implementing
long polls with a threaded server would be trivial, but a threaded
server cannot handle 100+ simultaneous (long running) requests, because
that would require 100+ threads to be running.
This central "message distributor" concept seems to be working. About
1-2% CPU overhead I have to pay for being able to send messages from one
browser into another within 100msec, which is fine.
I could have not done this without your help.
Thank you!
Laszlo
More information about the Python-list
mailing list