a basic bytecode to machine code compiler

Thu Mar 31 18:33:36 EDT 2011

I was looking at the list of bytecode instructions that Python uses and 
I noticed how much it looked like assembly. So I figured it can't be to 
hard to convert this to actual machine code, to get at least a small 
boost in speed.

And so I whipped up a proof of concept, available at 
https://github.com/Rouslan/nativecompile

I'm aware that PyPy already has a working JIT compiler, but I figure it 
will be a long time before they have a version of Python that is ready 
for everybody to use, so this could be useful in the mean time.

I chose to create this for the latest stable version of Python and I 
happen to use some functionality that is only available since Python 3.2.

The basic usage is:

 >>> import nativecompile
 >>> bcode = compile('print("Hello World!")','<string>','exec')
 >>> mcode = nativecompile.compile(bcode)
 >>> mcode()
Hello World!

This compiler does absolutely nothing clever. The only difference 
between the bytecode version and the compiled version is there is no 
interpreter loop and the real stack is used instead of an array.

Most of it is written in Python itself. There is one module written in C 
that does the things that cannot easily be done in pure Python, such as 
get the addresses of API functions and to execute the newly created code.

So far I have only implemented a few bytecode instructions and only have 
32-bit x86-compatible support. I have only tested this on Linux. It 
might work on Windows but only if you can run programs without any sort 
of data execution prevention (I can fix that if anyone wants). And I'm 
sure more optimized machine code can be generated (such as rearranging 
the error checking code to work better with the CPU's branch predictor).

Since so few instructions are implemented I haven't done any benchmarks.

What do people think? Would I be wasting my time going further with this?