[Web-SIG] A query for hosting providers

Remi Delon remi at cherrypy.org
Thu Mar 31 12:56:33 CEST 2005


Ian Bicking wrote:
> Remi Delon wrote:
> 
>>> I'm wondering -- and this is mostly directed to the hosting providers 
>>> (Remi, Sean...) -- what are the problems with providing 
>>> commodity-level hosting for Python programs?  I can think of some, 
>>> but I'm curious what you've encountered and if you have ideas about 
>>> how to improve things.
>>>
>>> Some things I've thought about:
>>> * Long running processes are hard to maintain (assuming we rule out 
>>> CGI).  Code becomes stale, maybe the server process gets in a bad 
>>> state.   Sometimes processes becomes wedged.  With mod_python this 
>>> can effect the entire site.
>>
>>
>>
>> Yes, maintaining long-running processes can be a pain, but that's not 
>> related to python itself, it's true regardless of the language that 
>> was used to write the program.
>>
>>> * Isolating clients from each other can be difficult.  For mod_python 
>>> I'm assuming each client needs their own Apache server.
>>
>>
>>
>> Yes, that's how we ended up setting up our mod_python accounts.
>> We also found stability problems in some of the other mod_* modules 
>> (mod_webkit, mod_skunkweb, ...) and they sometimes crashed the main 
>> Apache server (very bad). So for all the frameworks that support a 
>> standalone HTTP server mode (CherryPy, Webware, Skunkweb, ...) we now 
>> set them up as standalone HTTP server listening on a local port, and 
>> we just use our main Apache server as a proxy to these servers.
>> This allows us to use the trick described on this page: 
>> http://www.cherrypy.org/wiki/BehindApache (look for "autostart.cgi") 
>> to have Apache restart the server automatically if it ever goes down.
> 
> 
> On our own servers we've been using CGI connectors (wkcgi, Zope.cgi), 
> which seem fast enough, and of course won't be crashing Apache.

Yeah, but we wanted a somewhat "standard" way of talking to Apache and
most frameworks do come with a small HTTP server, so that works fine for
us and it also completely isolates the process from Apache.

> Have you looked at Supervisor for long running processes?
>   http://www.plope.com/software/supervisor/
> I haven't had a chance to use it, but it looks useful for this sort of 
> thing.

Well, there are several such supervising tools (daemontools is another
one), but again, they never matched our exact needs. For instance,
sometimes it's OK if a process is down ... it could just be that the
user is working on his site. And also, they usually only watch one
thing: make sure that the process stays up, but there are a million
other things we wanted to watch for. So we just wrote our own scripts.


> HTTP does seem like a reasonable way to communicate between servers, 
> instead of all these ad hoc HTTP-like protocols (PCGI, SCGI, FastCGI, 
> mod_webkit, etc).  My only disappointment with that technique is that 
> you lose some context -- e.g., if REMOTE_USER is set, or 
> SCRIPT_NAME/PATH_INFO (you probably have to configure your URLs, since 
> they aren't detectable), mod_rewrite's additional environmental 
> variables, etc.  Hmm... I notice you use custom headers for that 
> (CP-Location), and I suppose other variables could also be passed 
> through... it's just unfortunate because that significantly adds to the 
> Apache configuration, which is something I try to avoid -- it's easy 
> enough to put in place, but hard to maintain.

The CP-Location trick is not needed (I should remove it from this page
as it confuses people).
Have a look at the section called "What are the drawbacks of running
CherryPy behind Apache ?" on this page:
http://www.cherrypy.org/wiki/CherryPyProductionSetup
It summarizes my view on this (basically, there aren't any real 
drawbacks if you're using mod_rewrite with Apache2).


>>>  Maybe this isn't as much of a problem these days, as virtualizing 
>>> technologies have improved, and multiple Apache processes isn't that 
>>> big of a deal.
>>> * Setup of frameworks is all over the place.  Setting up multiple 
>>> frameworks might be even more difficult.  Some of them may depend on 
>>> mod_rewrite.  Server processes are all over the place as well.
>>>
>>> But I don't have a real feeling for how to solve these, and I'm sure 
>>> there's things I'm not thinking about.
>>
>> Well, the 2 main problems that I can think of are:
>>     - Python frameworks tend to work as long-running processes, which 
>> have a lot of advantages for your site, but are a nightmare for 
>> hosting providers. There are soooo many things to watch for: CPU usage 
>> (a process can start "spinning"), RAM usage, process crashing, ... But 
>> that is not related to python and any hosting provider that supports 
>> long-running processes face the same challenge. For instance, we 
>> support Tomcat and the problems are the same. For this we ended up 
>> writing a lot of custom monitoring scripts on our own (we couldn't 
>> find exactly what we needed out there). Fortunately, python makes it 
>> easy to write these scripts :-)
> 
> 
> Do you do monitoring on a per-process basis (like a supervisor process) 
> or just globally scan through the processes and kill off any bad ones?

We monitor the general health of our servers on various levels and we
monitor the response time of some key sites/services on each of our 
servers to make sure that overall the server is OK.
For each individual site of our customers, we only have scripts that try
to restart the sites if they ever go down, but that's it (if the
customer changed their site and broke it, there isn't much we can do
about it).

> I've though that a forking server with a parent that monitored children 
> carefully would be nice, which would be kind of a per-process monitor. 
> It would mean I'd have to start thinking multiprocess, reversing all my 
> threaded habits, but I think I'm willing to do that in return for really 
> good reliability.

I'm still very much on the "thread pool" camp :-)
I've got CherryPy sites that run in a thread pool mode for months 
without any stability or memory leak problem.
If your process crashes or leaks memory then there's something wrong 
with your program in the first place, and the right way to solve it is 
not to switch to a multiprocess model.
Finally, if you want a monitoring process, it can be a completely 
separate process which allows you to still keep a "thread pool" model 
for your main process.

>>     - But another challenge (and this one is more specific to Python) 
>> is the number of python versions and third party modules that we have 
>> to support. For instance, at Python-Hosting.com, we have to support 
>> all 4 versions of python: 2.1, 2.2, 2.3 and 2.4, and all of them are 
>> being used by various people. And for each version, we usually have 10 
>> to 20 third-party modules (mysql-python, psycopg, elementtree, 
>> sqlobject, ...) that people need ! We run Red Hat Enterprise 3, but 
>> RPMs for python are not designed to work with multiple python versions 
>> installed, and RPMs for third-party modules are usually inexistent. As 
>> a result, we have to build all the python-related stuff from source. 
>> And some of these modules are sometimes hard to build (the 
>> python-subversion bindings for instance) and you can run into some 
>> library-version-compatibility nightmare. And as if this wasn't enough, 
>> new releases of modules come out everyday ...
> 
> 
> For the apps I've been deploying internally -- where we have both a more 
> controlled and less controlled environment than a commercial host -- 
> I've been installing every prerequesite in a per-application location, 
> i.e., ``python setup.py install --install-lib=app/stdlib``.  Python 
> module versioning issues are just too hard to resolve, and I'd rather 
> leave standard-packages with only really stable software that I don't 
> often need to update (like mxDateTime), and put everything else next to 
> the application.

Well, we have a mix of both: for all "more or less common" modules, we 
install them system-wide. If someone wants a really "esoteric" module 
that noone else on the server is likely to use, we usually tell them to 
install it in their home directory.

>> I think that this second point is the main challenge and any hosting 
>> provider that is not specialized in python doesn't have the time or 
>> the knowledge to build and maintain all these python versions and 
>> third-party modules. Of course, they could just say "we're going to 
>> support this specific python version with these few third-party 
>> modules and that's it", but experience shows that most people need at 
>> least one or 2 "uncommon" third-party modules for their site so if 
>> that module is missing they just can't run their site ...
> 
> Any reason for all the Python versions?  Well, I guess it's hard to ask 
> clients to upgrade.  If I was to support people in that way, I'd 
> probably try to standardize a Python version or two, and some core 
> modules (probably the ones that are harder to build, like database 
> drivers), and ask users to install everything else in their own 
> environment.  But of course when you are in service you have to do what 
> people want you to do...

Well, we very much decide what software/version we support based on 
customer demand ... If enough people want python 2.1, 2.2, 2.3 and 2.4 
(which is the case right now), then we support all of them ...
Recently there was a high demand for a commercial Trac/Subversion 
hosting with backups and HTTPS access, so we came up with such an offer 
and it turned out to be quite successful.

>> But above all, I think that the main reason why python frameworks are 
>> not more commonly supported by the big hosting providers is because 
>> the market for these frameworks is very small (apart from Zope/Plone). 
>> For all the "smaller" frameworks (CherryPy, Webware, SkunkWeb, 
>> Quixote, ...) we host less than 50 of each, so the big hosting 
>> providers simply won't bother learning these frameworks and supporting 
>> them for such a small market.
> 
> If they could support all of them at once, do you think it would be more 
> interesting to hosting providers?

Well, if all frameworks came in nicely packaged RPMs and they all 
integrated the same way with Apache (mod_wsgi anyone ?) I guess that 
would be a big step forward ... But you'd still have the problem of all 
the python third-party modules that people need ...

Remi.




More information about the Web-SIG mailing list