[Web-SIG] A query for hosting providers

Ian Bicking ianb at colorstudy.com
Tue Mar 29 19:27:20 CEST 2005


Remi Delon wrote:
>> I'm wondering -- and this is mostly directed to the hosting providers 
>> (Remi, Sean...) -- what are the problems with providing 
>> commodity-level hosting for Python programs?  I can think of some, but 
>> I'm curious what you've encountered and if you have ideas about how to 
>> improve things.
>>
>> Some things I've thought about:
>> * Long running processes are hard to maintain (assuming we rule out 
>> CGI).  Code becomes stale, maybe the server process gets in a bad 
>> state.   Sometimes processes becomes wedged.  With mod_python this can 
>> effect the entire site.
> 
> 
> Yes, maintaining long-running processes can be a pain, but that's not 
> related to python itself, it's true regardless of the language that was 
> used to write the program.
> 
>> * Isolating clients from each other can be difficult.  For mod_python 
>> I'm assuming each client needs their own Apache server.
> 
> 
> Yes, that's how we ended up setting up our mod_python accounts.
> We also found stability problems in some of the other mod_* modules 
> (mod_webkit, mod_skunkweb, ...) and they sometimes crashed the main 
> Apache server (very bad). So for all the frameworks that support a 
> standalone HTTP server mode (CherryPy, Webware, Skunkweb, ...) we now 
> set them up as standalone HTTP server listening on a local port, and we 
> just use our main Apache server as a proxy to these servers.
> This allows us to use the trick described on this page: 
> http://www.cherrypy.org/wiki/BehindApache (look for "autostart.cgi") to 
> have Apache restart the server automatically if it ever goes down.

On our own servers we've been using CGI connectors (wkcgi, Zope.cgi), 
which seem fast enough, and of course won't be crashing Apache.

Have you looked at Supervisor for long running processes?
   http://www.plope.com/software/supervisor/
I haven't had a chance to use it, but it looks useful for this sort of 
thing.

HTTP does seem like a reasonable way to communicate between servers, 
instead of all these ad hoc HTTP-like protocols (PCGI, SCGI, FastCGI, 
mod_webkit, etc).  My only disappointment with that technique is that 
you lose some context -- e.g., if REMOTE_USER is set, or 
SCRIPT_NAME/PATH_INFO (you probably have to configure your URLs, since 
they aren't detectable), mod_rewrite's additional environmental 
variables, etc.  Hmm... I notice you use custom headers for that 
(CP-Location), and I suppose other variables could also be passed 
through... it's just unfortunate because that significantly adds to the 
Apache configuration, which is something I try to avoid -- it's easy 
enough to put in place, but hard to maintain.

>>  Maybe this isn't as much of a problem these days, as virtualizing 
>> technologies have improved, and multiple Apache processes isn't that 
>> big of a deal.
>> * Setup of frameworks is all over the place.  Setting up multiple 
>> frameworks might be even more difficult.  Some of them may depend on 
>> mod_rewrite.  Server processes are all over the place as well.
>>
>> But I don't have a real feeling for how to solve these, and I'm sure 
>> there's things I'm not thinking about.
> 
> 
> Well, the 2 main problems that I can think of are:
>     - Python frameworks tend to work as long-running processes, which 
> have a lot of advantages for your site, but are a nightmare for hosting 
> providers. There are soooo many things to watch for: CPU usage (a 
> process can start "spinning"), RAM usage, process crashing, ... But that 
> is not related to python and any hosting provider that supports 
> long-running processes face the same challenge. For instance, we support 
> Tomcat and the problems are the same. For this we ended up writing a lot 
> of custom monitoring scripts on our own (we couldn't find exactly what 
> we needed out there). Fortunately, python makes it easy to write these 
> scripts :-)

Do you do monitoring on a per-process basis (like a supervisor process) 
or just globally scan through the processes and kill off any bad ones? 
I've though that a forking server with a parent that monitored children 
carefully would be nice, which would be kind of a per-process monitor. 
It would mean I'd have to start thinking multiprocess, reversing all my 
threaded habits, but I think I'm willing to do that in return for really 
good reliability.

>     - But another challenge (and this one is more specific to Python) is 
> the number of python versions and third party modules that we have to 
> support. For instance, at Python-Hosting.com, we have to support all 4 
> versions of python: 2.1, 2.2, 2.3 and 2.4, and all of them are being 
> used by various people. And for each version, we usually have 10 to 20 
> third-party modules (mysql-python, psycopg, elementtree, sqlobject, ...) 
> that people need ! We run Red Hat Enterprise 3, but RPMs for python are 
> not designed to work with multiple python versions installed, and RPMs 
> for third-party modules are usually inexistent. As a result, we have to 
> build all the python-related stuff from source. And some of these 
> modules are sometimes hard to build (the python-subversion bindings for 
> instance) and you can run into some library-version-compatibility 
> nightmare. And as if this wasn't enough, new releases of modules come 
> out everyday ...

For the apps I've been deploying internally -- where we have both a more 
controlled and less controlled environment than a commercial host -- 
I've been installing every prerequesite in a per-application location, 
i.e., ``python setup.py install --install-lib=app/stdlib``.  Python 
module versioning issues are just too hard to resolve, and I'd rather 
leave standard-packages with only really stable software that I don't 
often need to update (like mxDateTime), and put everything else next to 
the application.

> I think that this second point is the main challenge and any hosting 
> provider that is not specialized in python doesn't have the time or the 
> knowledge to build and maintain all these python versions and 
> third-party modules. Of course, they could just say "we're going to 
> support this specific python version with these few third-party modules 
> and that's it", but experience shows that most people need at least one 
> or 2 "uncommon" third-party modules for their site so if that module is 
> missing they just can't run their site ...

Any reason for all the Python versions?  Well, I guess it's hard to ask 
clients to upgrade.  If I was to support people in that way, I'd 
probably try to standardize a Python version or two, and some core 
modules (probably the ones that are harder to build, like database 
drivers), and ask users to install everything else in their own 
environment.  But of course when you are in service you have to do what 
people want you to do...

> But above all, I think that the main reason why python frameworks are 
> not more commonly supported by the big hosting providers is because the 
> market for these frameworks is very small (apart from Zope/Plone). For 
> all the "smaller" frameworks (CherryPy, Webware, SkunkWeb, Quixote, ...) 
> we host less than 50 of each, so the big hosting providers simply won't 
> bother learning these frameworks and supporting them for such a small 
> market.

If they could support all of them at once, do you think it would be more 
interesting to hosting providers?

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org


More information about the Web-SIG mailing list