Why not a Python compiler?

Fri Feb 8 08:12:29 EST 2008

On Feb 8, 12:25 am, Hrvoje Niksic <hnik... at xemacs.org> wrote:
> Steven D'Aprano <st... at REMOVE-THIS-cybersource.com.au> writes:
> > Be fair -- he's asking what specific features of Python make it
> > hard.  That's a reasonable question.
>
> Indeed.  The best explanation I've seen explained goes something like
> this: imagine a hypothetical Python compiler that achieves native
> compilation by compiling to Common Lisp and using the CL's compiler to
> produce native code.  Upon encountering the expression such as:
>
> a + b
>
> the compiler could do little else except translate it to something
> like:
>
> (python:add a b)
>
> In order to correctly implement Python addition, python:add needs to
> do a lot of work at run-time.  It needs to check for __add__ method of
> one or both operands without assuming what it does, since a
> user-defined class is free to define __add__ to do whatever it
> pleases.  The compiler could attempt to infer the types of operands,
> but that is hard since an expression such as "a = module.SomeClass()"
> completely changes meaning if module.SomeClass or
> module.SomeClass.__add__ change.  Such changes may seem improbable,
> but fact is that being able to do them is a documented part of the
> language, and a lot of code makes good use of it.  Assuming these
> things don't happen means the compiler doesn't implement Python.

> This applies not only to addition; expressions such as "foo.bar",
> which include any method call, would be translated to (python:getattr
> foo "bar"), and so on.  Most functions would have to construct actual
> tuples, since a function can be replaced with one that takes *args.
> Again, optimizing almost any of this away would change the semantics
> of Python.  From the ability to assign to classes, to modules, to
> globals(), and to __dict__'s, literally anything can change at
> run-time.  *Some* kinds of runtime dispatches can be sped up by

In this respect, CL's is similar to Python. Generic functions are even
more dynamic than Python's methods. You can add a method to a gf
whenever you please. Also, you can assign to classes, change their
structure (add or remove slots), change their metaclass and so on. As
for foo.bar and friends: in CL you have to define an accessor function
if you don't want to use (SLOT-VALUE foo 'bar) all the time (this is
usually done through a shortcut in the DEFCLASS macro).

CL objects are not hash-tables, so you will get an error if you try to
assign to a bogus (by which I mean not present in the class
definition) slot. However, you can implement a slot-missing method
to sanely handle this situation.

You cannot reset slots containing methods in CL (as they do not
exist). However, you should be able to implement
SLOT-VALUE-USING-CLASS and (SETF SLOT-VALUE-USING-CLASS) which would
emulate Python's behavior. Finally, you can use EQL specializers,
which give you object (not class) specific behavior.

The one thing that isn't so easy to emulate is assignment to modules
(though redefinition of a function in some package works as you would
expect).

> setting up sophisticated caches (one such cache for methods is being
> applied to CPython), but getting that right without breaking
> correctness is quite tricky.  Besides the same caches could be used to
> speed up CPython too, so they don't constitute an advantage of the
> compiler.
>
> The main determinant of Python's performance isn't the interpreter
> overhead, but the amount of work that must be done at run-time and
> cannot be moved to compile-time or optimized away.

Well, I am still not convinced that Python is intrinsically
un-compilable :-).

Some optimizations that are nowadays done by hand probably could be
abstracted. Think assigning a method to a local variable before using
it in a loop... Expressing simple loops as C for loops... Tail call
optimization... (Of course, my intuitions about what would be a great
optimization for a Python compiler are vague, so please correct me if
I am drastically wrong.) Note that these are the kind of optimizations
you don't want in your interpreter. When it comes to debugging, the
lack of tail call optimization is a feature, not a bug.

Finally, skimming the CLPython mailing list suggests it actually
works. Presently it is slower than CPython, but after all it is a far
from mature one developer project, so you can treat it as a proof of
concept.

Anyway, I won't be very surprised if in a couple of years your average
c.l.py troll is going to be asking "So I heard that Python is an
interpreted only language, how can it be any good?", and you will be
explaining for the thousandth time: "Since version 4.2 Python has a
fast native code compiler, so...". ;-)

Cheers,

    -- Richard

BTW maybe Dylan could be a better model for Python for compilation? In
many respects it is a lot more similar to Python than Common Lisp. It
is a Lisp-1 (it has a single namespace for functions and variables),
and it seems to be more object oriented than CL.