scaling problems

Mon May 19 21:04:28 EDT 2008

On Mon, May 19, 2008 at 8:47 PM, James A. Donald <jamesd at echeque.com> wrote:
> I am just getting into python, and know little about it, and am
> posting to ask on what beaches the salt water crocodiles hang out.
>
> 1.  Looks to me that python will not scale to very large programs,
> partly because of the lack of static typing, but mostly because there
> is no distinction between creating a new variable and utilizing an
> existing variable, so the interpreter fails to catch typos and name
> collisions.  I am inclined to suspect that when a successful small
> python program turns into a large python program, it rapidly reaches
> ninety percent complete, and remains ninety percent complete forever.

I can assure you that in practice this is not a problem. If you do
proper unit testing then you will catch many, if not all, of the
errors that static typing catches. There are also tools like PyLint,
PyFlakes and pep8.py will also catch many of those mistakes.

> 2.  It is not clear to me how a python web application scales.  Python
> is inherently single threaded, so one will need lots of python
> processes on lots of computers, with the database software handling
> parallel accesses to the same or related data.  One could organize it
> as one python program for each url, and one python process for each
> http request, but that involves a lot of overhead starting up and
> shutting down python processes.  Or one could organize it as one
> python program for each url, but if one gets a lot of http requests
> for one url, a small number of python processes will each sequentially
> handle a large number of those requests.  What I am really asking is:
> Are there python web frameworks that scale with hardware and how do
> they handle scaling?

What is the difference if you have a process with 10 threads or 10
separate processes running in parallel? Apache is a good example of a
server that may be configured to use multiple processes to handle
requests. And from what I hear is scales just fine.

I think you are looking at the problem wrong. The fundamentals are the
same between threads and processes. You simply have a pool of workers
that handle requests. Any process is capable of handling any request.
The key to scalability is that the processes are persistent and not
forked for each request.

> Please don't read this as "Python sucks, everyone should program in
> machine language expressed as binary numbers".  I am just asking where
> the problems are.

The only real problem I have had with process pools is that sharing
resources is harder. It is harder to create things like connection
pools.

-- 
David
http://www.traceback.org