Why is it impossible to create a compiler than can compile Python to machinecode like C?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Mar 4 20:35:34 EST 2013


On Mon, 04 Mar 2013 16:36:36 +0000, Grant Edwards wrote:

> On 2013-02-28, kramer65 <kramerh at gmail.com> wrote:
> 
>> I'm using Python for a while now and I love it. There is just one thing
>> I cannot understand. There are compilers for languages like C and C++.
>> why is it impossible to create a compiler that can compile Python code
>> to machinecode?
> 
> The main issue is that python has dynamic typing.  The type of object
> that is referenced by a particular name can vary, and there's no way (in
> general) to know at compile time what the type of object "foo" is.
> 
> That makes generating object code to manipulate "foo" very difficult.

That's only a limitation with a static compiler that tries to generate 
fast machine code at compile time. A JIT compiler can generate fast 
machine code at runtime. The idea is that the time you save by running 
more optimized code is greater than the time it costs to generate that 
code each time you run. If not, then you just run the normal byte-code 
you would have run, and you've lost very little.

This has been a very successful approach. Psyco worked very well, and 
it's successor PyPy seems to work even better. PyPy, on average, runs 
about 5-6 times faster than CPython, and for some tasks about 50 times 
faster. That makes it broadly speaking as fast as Java and approaching C, 
and even in some circumstances even beat static C compilers. (Admittedly 
only under very restrictive circumstances.)

The downside is that JIT compilers need a lot of memory. To oversimplify, 
you might take source code like this:

x = a + b

A static compiler knows what types a and b are, and can turn it into a 
single fairly compact piece of machine code. Using my own crappy invented 
machine code notation:

ADD a, b
STORE x


A JIT compiler has to generate a runtime check and one or more fast 
branches, plus a fallback:


CASE (a and b are both ints):
    ADD a, b
    STORE x
CASE (a and b are both strings):
    COPY a, x
    CONCATENATE b, x
OTHERWISE:
    execute byte code to add two arbitrary objects


The above is probably nothing like the way PyPy actually works. Anyone 
interested in how PyPy really works should spend some time reading their 
website and blog.

http://pypy.org/


I can especially recommend this to give you a flavour for just how 
complicated this sort of thing can become:

http://morepypy.blogspot.com.au/2011/01/loop-invariant-code-motion.html



-- 
Steven



More information about the Python-list mailing list