python simply not scaleable enough for google?

Sat Nov 14 21:42:07 EST 2009

This whole thread has now proceeded to bore me senseless. I'm going to respond 
once with a restatement of what I originally said. Then I'm going to drop it, and
never respond to the thread again. Much of what's below has been said by others 
as well; I'm taking no credit for it, just trying to put it together into a coherent
framework. 

1. The original question is `Is Python scalable enough for Google' (or, I assume 
any other huge application). That's what I was responding to.

2. `Scalable' can mean performance or productivity/reliability/maintenance quality.
A number of posters conflated those. I'll deal with p/r/m by saying I'm not familiar 
with any study that has taken real enterprise-type programs and compared, e.g., 
Java, Python, and C++ on the p/r/m criteria. Let's leave that issue by saying that 
we all enjoy programming in Python, and Python has pretty much the same feature 
set (notably modules) as any other enterprise language. This just leaves us with
performance. 

3. Very clearly CPython can be improved. I don't take most benchmarks very seriously, 
but we know that CPython interprets bytecode, and thus suffers relative to systems 
that compile into native code, and likely to some other interpretative systems. (Lua
has been mentioned, and I recall looking at a presentation by the Lua guys on why they
chose a register rather than stack-based approach.)

4. Extensions such as numpy can produce tremendous improvements in productivity AND
performance. One answer to `is Python scalable' is to rephrase it as `is Python+C 
scalable'. 

5. There are a number of JIT projects being considered, and one or more of these might 
well hold promise. 

6. Following Scott Meyers' outstanding advice (from his Effective C++ books), one should
prefer compile time to runtime wherever possible, if one is concerned about performance. 
An implementation that takes hints from programmers, e.g., that a certain variable is 
not to be changed, or that a given argument is always an int32, can generate special-case
code that is at least in the same ballpark as C, if not as fast. 

This in no way detracts from Python's dynamic nature: these hints would be completely
optional, and would not change the semantics of correct programs. (They might cause
programs running on incorrect data to crash, but if you want performance, you are kind of 
stuck). These hints would `turn off' features that are difficult to compile into efficient
code, but would do so only in those parts of a program where, for example, it was known that
a given variable contains an int32. Dynamic (hint-free) and somewhat less-dynamic (hinted)
code would coexist. This has been done for other languages, and is not a radically new 
concept. 

Such hints already exist in the language; __slots__ is an example. 

The language, at least as far as Python 3 is concerned, has pretty much all the machinery 
needed to provide such hints. Mechanisms that are recognized specially by a high-performance
implementation (imported from a special module, for example) could include: annotations, 
decorators, metaclasses, and assignment to special variables like __slots__.

7. No implementation of Python at present incorporates JITting and hints fully. Therefore, 
the answer to `is CPython performance-scalable' is likely `NO'. Another implementation that 
exploited all of the features described here might well have satisfactory performance for 
a range of computation-intensive problems. Therefore, the answer to `is the Python language 
performance-scalable' might be `we don't know, but there are a number of promising implementation
techniques that have been proven to work well in other languages, and may well have tremendous
payoff for Python'. 

-- v