Why is it impossible to create a compiler than can compile Python to machinecode like C?

Stefan Behnel stefan_ml at behnel.de
Fri Mar 1 02:48:34 EST 2013


Steven D'Aprano, 01.03.2013 04:47:
> On Thu, 28 Feb 2013 22:03:09 +0100, Stefan Behnel wrote:
> 
>> The most widely used static Python compiler is Cython
> 
> Cython is not a Python compiler. Cython code will not run in a vanilla 
> Python implementation. It has different keywords and syntax, e.g.:
> 
> cdef inline int func(double num):
>     ...
> 
> which gives SyntaxError in a Python compiler.

Including Cython, if you're compiling a ".py" file. The above is only valid
syntax in ".pyx" files. Two languages, one compiler. Or three languages, if
you want, because Cython supports both Python 2 and Python 3 code in
separate compilation modes.

The old model, which you might have learned at school:

* a Python implementation is something that runs Python code

* a Cython implementation is something that does not run Python code

hasn't been generally true since, well, probably forever. Even Cython's
predecessor Pyrex was capable of compiling a notable subset of Python code,
and Cython has gained support for pretty much all Python language features
about two years ago. Quoting the project homepage: "the Cython language is
a superset of the Python language".

http://cython.org/

If you don't believe that, just try it yourself. Try to compile some Python
3 code with it, if you find the time. Oh, and pass the "-3" option to the
compiler in that case, so that it knows that it should switch to Python 3
syntax/semantics mode. It can't figure that out from the file extension
(although you can supply the language level of the file in a header comment
tag). And while you're at it, also pass the "-a" option to let it generate
an HTML analysis of your code that highlights CPython interaction and thus
potential areas for manual optimisation.

The "superset" bit doesn't mean I've stopped fixing bugs from time to time
that CPython's regression test suite reveals. If you want to get an idea of
Cython's compatibility level, take a look at the test results, there are
still about 470 failing tests left out of 26000 in the test suites of Py2.7
and 3.4:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/

One reason for a couple of those failures (definitely not all of them) is
that Cython rejects some code at compile time that CPython only rejects at
runtime. That's because the tests were explicitly written for CPython and
assume that the runtime cannot detect some errors before executing the
code. So, in a way, being capable of doing static analysis actually
prevents Cython from being fully CPython compatible. I do not consider that
a bad thing.

And, BTW, we also compile most of Python's benchmark suite by now:

https://sage.math.washington.edu:8091/hudson/view/bench/

The results are definitely not C-ishly fast, usually only some 10-80%
improvement or so, e.g. only some 35% in the Django benchmark, but some of
the results are quite ok for plain Python code that is not manually
optimised for compilation. Remember, there are lots of optimisations that
we deliberately do not apply, and static analysis generally cannot detect a
lot of dynamic code patterns, runtime determined types, etc. That's clearly
PyPy's domain, with its own set of pros and cons.

The idea behind Cython is not that it will magically make your plain Python
code incredibly fast. The idea is to make it really, really easy for users
to bring their code up to C speed *themselves*, in the exact spots where
the code really needs it. And yes, as was already mentioned in this thread,
there is a pure Python mode for this that allows you to keep your code in
plain Python syntax while optimising it for compilation. The "Cython
optimised" benchmarks on the page above do exactly that.

I wrote a half-rant about static Python compilation in a recent blog post.
It's in English, and you might actually want to read it. I would say that I
can claim to know what I'm talking about.

http://blog.behnel.de/index.php?p=241

Stefan





More information about the Python-list mailing list