[Cython] Yet another Python to C compiler

Fri Aug 12 10:17:07 CEST 2011

Vitja Makarov, 12.08.2011 08:49:
> Recently I've found one more Python to C compiler, that translates
> python bytecode into C source.
> And author says about 100% Python compatibility.

That's clearly incorrect. From the code, it appears that at least the 
builtins are static, which means that it's *not* Python compatible (even 
less so than Cython currently is). I'm also not sure about the integer type 
handling - worth some investigation, but it looks somewhat sloppy.

I mean, it's well known that you can generate fast code by diverting from 
Python semantics. Shedskin excels in that. They mostly seem to be using the 
same tricks that Cython uses as well, namely static builtins, as well as 
optimistic optimisations for likely types based on usage patterns. And a 
huge amount of special casing, maybe even more than Cython currently applies.

They also seem to make excessive use of CPython internals and internal 
APIs. Not sure if that's a good idea. Specifically, they didn't care a bit 
about Python 3 compatibility, it seems.

> The project is a signle 800KB python file.

That's just plain sick.

> http://code.google.com/p/2c-python/
>
> I was wondering when found that 2c beats Cython in some benchmarks.
> For instance, it's about 2 times faster than Cython in pystone test

PyStone is known to be a particularly bad benchmark.

The other benchmark results are somewhat surprising and (IMHO) hint mostly 
at a lack of Python compatibility. Again, it's well known that you can make 
specific benchmarks fast by diverting from Python semantics in general.

For example, Cython runs richards.py ~70% faster, whereas they claim ~90%. 
Cython is ~50% faster on slowpickle, they claim ~80%. Not really that much 
of a difference actually, and easily achieved by tuning the language 
semantics to the code.

> Think we should investigate performance differences and make cython faster.

You could start by contacting the authors. From the project site, it 
appears that it's basically Russian(?)-only.

My guess is that they simply use more special casing and slightly better 
type inference than Cython currently does. Look at these, for example:

http://code.google.com/p/2c-python/source/browse/2c.py?r=23d5c350a56e21d5a3e12e153d1fbe91ae1f5d56#15583

http://code.google.com/p/2c-python/source/browse/2c.py?r=23d5c350a56e21d5a3e12e153d1fbe91ae1f5d56#15869

They infer some more return types of builtin methods and their compiler 
knows about some stdlib modules (such as math):

http://code.google.com/p/2c-python/source/browse/2c.py?r=23d5c350a56e21d5a3e12e153d1fbe91ae1f5d56#578

Overriding external modules statically means diverting from Python 
semantics. Cython would want to require user interaction for this, e.g. an 
explicit external .pxd file.

Basically, I think that Cython could do a lot better with control flow 
driven type inference. Another thing is that it would be nice to extend the 
type system so that it knows about data types in Python containers.

What we should definitely do is to use Mark's fused types for 
optimisations, e.g. when default arguments hint at a specific input type, 
or even just when we find a function call inside the module with a specific 
combination of input types.

Also, I would expect that eventually optimising the CyFunction type would 
give us another bit of performance.

Stefan