[Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?)

Fri Mar 7 11:15:59 CET 2008

On 07/03/2008, Manlio Perillo <manlio_perillo at libero.it> wrote:
>  Is it true that Apache can spawn additional processes,

Yes, for prefork and worker MPM, but not winnt on Windows. See for
example details for worker MPM in:

  http://httpd.apache.org/docs/2.2/mod/worker.html

> By the way, I know there is an event based worker in Apache.
>  Have you exterience with it?

No, haven't used it. It isn't an event driven system like you know it.
It still uses threads like worker MPM. The difference as I understand
it is that it dedicates a single thread to managing client socket
connections maintained due to keep alive, rather than a whole thread
being tied up for each such connection. So, it is just an improvement
over worker and does not implement a full event driven system.

>  > No matter what technology one uses there will be such trade offs and
>  > they will vary depending on what you are doing. Thus it is going to be
>  > very rare that one technology is always the "right" technology. Also,
>  > as much as people like to focus on raw performance of the web server
>  > for hosting Python web applications, in general the actual performance
>  > matters very little in the greater scheme of things (unless your
>  > stupid enough to use CGI). This is because that isn't where the
>  > bottlenecks are generally going to be. Thus, that one hosting solution
>  > may for a hello world program be three times faster than another,
>  > means absolutely nothing if that ends up translating to less than 1
>  > percent throughput when someone loads and runs their mega Python
>  > application. This is especially the case when the volume of traffic
>  > the application receives never goes any where near fully utilising the
>  > actual resources available. For large systems, you would never even
>  > depend on one machine anyway and load balance across a cluster. Thus
>  > the focus by many on raw speed in many cases is just plain ridiculous
>  > as there is a lot more to it than that.
>
> There is not only the problem on raw speed.
>  There is also a problem of server resources usage.
>
>  As an example, an Italian hosting company poses strict limits on
>  resource usage for each client.

As would any sane web hosting company.

>  They do not use Apache, since they fear that serving embedded
>  applications limits their control

If they believe that embedded solutions like mod_python are the only
things available for Apache, then I can understand that. There are
other solutions though such as fastcgi and mod_wsgi daemon mode, so it
isn't as necessarily as unmanageable as they may believe. They perhaps
just don't know what options are available, don't understand the
technology well or how to manage it. I do admit though it would be
harder when it isn't your own application and you are hosting stuff
written by a third party.

>  Using Nginx + the wsgi module has the benefit to require less system
>  resources than flup (as an example) and, probabily, Apache.

Memory usage is also relative, just like network performance.
Configure Apache correctly and don't load modules you don't need and
the base overhead of Apache can be reduced quite a lot. For a big
system heavy on media using a separate media server such as nginx or
lighttpd can be sensible. One can then turn off keep alive on Apache
for the dynamic Python web application since keep alive doesn't
necessarily help there and will cause the sorts of issues the event
MPM attempts to solve. So, its manageable and there are known steps
one can take.

The real memory usage comes when someone loads up a Python web
application which requires 80-100MB per process at the outset before
much has even happened. Just because you are using another web hosting
solution, be it nginx or even a Python based web server, this will not
change that the Python web application is chewing up so much memory.

The one area where memory usage can be a problem with Python web
applications and which is not necessarily understood well by most
people, is the risk of concurrent requests causing a sudden burst in
memory usage. Imagine a specific URL which needs a large amount of
transient memory, for example something which is generating PDFs using
reportlab and PIL. All is okay if the URL only gets hit by one request
at a time, but if multiple requests hit at the same time, then your
memory blows out considerably as each request needs the large amount
of transient memory at the same time and once allocated it will be
retained by the process.

So, if one was using worker MPM to keep down the number of overall
processes and memory usage, you run the risk of this sort of problem
occurring. One could stop it occurring by implementing throttling in
the application, that is put locking on specific URLs which consumed
lots of transient memory to restrict number of concurrent requests,
but frankly I have never actually ever heard of anyone actually doing
it.

The alternative is to use prefork MPM, or similar model, such that
there can only be one active request in the process at a time. But
then you need more processes to handle the same number of requests, so
overall memory usage is high again. For large sites however, which can
afford lots of memory, using prefork would be the better way to go as
it will at least limit the possibilities of individual processes
spiking memory usage unexpectedly, with memory usage being more
predictable.

That all said, just because you aren't using threads and are handling
concurrency using an event driven system approach will not necessarily
isolate you from this specific problem. All in all it can be a tough
problem. If your web application demands are relatively simple then it
may never be an issue, but people are trying to do more and more
within the web application itself, rather than delegating it to
separate back end systems or programs. At the same time they want to
use cheap memory constrained VPS systems. So, lots of fun. :-)

Graham