Why is Python slow? (was Re: One Python 2.1 idea)

Sun Dec 24 20:06:48 EST 2000

On 24 Dec 2000 12:37:02 -0800, Aahz Maruch <aahz at panix.com> wrote:
>In article <m38zp6nk8k.fsf at localhost.localdomain>,
>Lieven Marchand  <mal at bewoner.dma.be> wrote:
>>
>> I think Common Lisp shows that getting very good speed in Python is
>> quite feasable. The original CMUCL implementers weren't such a large
>> group. Their highly optimising compiler (which incidentally is also
>> called Python) has on occasion beaten FORTRAN at numerics. I don't
>> know why some people in the Python community think compiling Python is
>> such a problem. Practically all the problems have been tackled and
>> solved 20 years ago in the Lisp community.
>
> One of the reasons that gets brought up less often than it should is
> that Guido is somewhat fanatical about also keeping the CPython
> implementation clean and simple.  I'm pretty sure that -- Ghu forbid it
> become necessary -- I could manage to maintain CPython if I had to.
>
> This leaves less room for optimization than many people think.

Perhaps.

I started to take a look at the Ocaml 3.00 bytecode interpreter the
other day, and it is quite clean and readable. It is also right around
10-20 times faster than the Python interpreter, measuring just the
basic busy-loop. The only filips in the Ocaml interpreter that jumped
out at me were a) some macrology to enable the gcc computed labels
extensions when it was available, and b) it uses tagging to avoid
allocating on the heap when handling integers.

The Python code was:

def loop(n):
    t = time.time()
    while n != 0:
        n = n - 1
    return time.time() - t

def loop2(n):
    t = time.time()
    for i in xrange(n):
        pass
    return time.time() - t

When passed n=10000000, the median runtime for loop1 was 7.05 seconds,
and for loop2 was 3.89 seconds.

Some equivalent code in Ocaml:

let rec loop n = if n=0 then () else loop(n-1);;

let time n = let t = Sys.time()
              in loop n;
                 Sys.time() -. t;;

With n=10000000, this ran in a median of 0.4 seconds. This is the
performance of the interpreter, mind -- Ocaml also has a compiler
which I didn't exercise. I think this suggests there is room for
*huge* performance improvements in interpreted Python.

I recall running into a webpage where Vladimir Marangozov had a patch
that used gcc's computed labels to do indirect threading in Python.
Anyone know what the performance improvement from that was?

Neel