Why not a Python compiler?

Thu Feb 7 18:25:46 EST 2008

Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:

> Be fair -- he's asking what specific features of Python make it
> hard.  That's a reasonable question.

Indeed.  The best explanation I've seen explained goes something like
this: imagine a hypothetical Python compiler that achieves native
compilation by compiling to Common Lisp and using the CL's compiler to
produce native code.  Upon encountering the expression such as:

a + b

the compiler could do little else except translate it to something
like:

(python:add a b)

In order to correctly implement Python addition, python:add needs to
do a lot of work at run-time.  It needs to check for __add__ method of
one or both operands without assuming what it does, since a
user-defined class is free to define __add__ to do whatever it
pleases.  The compiler could attempt to infer the types of operands,
but that is hard since an expression such as "a = module.SomeClass()"
completely changes meaning if module.SomeClass or
module.SomeClass.__add__ change.  Such changes may seem improbable,
but fact is that being able to do them is a documented part of the
language, and a lot of code makes good use of it.  Assuming these
things don't happen means the compiler doesn't implement Python.

This applies not only to addition; expressions such as "foo.bar",
which include any method call, would be translated to (python:getattr
foo "bar"), and so on.  Most functions would have to construct actual
tuples, since a function can be replaced with one that takes *args.
Again, optimizing almost any of this away would change the semantics
of Python.  From the ability to assign to classes, to modules, to
globals(), and to __dict__'s, literally anything can change at
run-time.  *Some* kinds of runtime dispatches can be sped up by
setting up sophisticated caches (one such cache for methods is being
applied to CPython), but getting that right without breaking
correctness is quite tricky.  Besides the same caches could be used to
speed up CPython too, so they don't constitute an advantage of the
compiler.

The main determinant of Python's performance isn't the interpreter
overhead, but the amount of work that must be done at run-time and
cannot be moved to compile-time or optimized away.