[Cython] Yet another Python to C compiler
Stefan Behnel
stefan_ml at behnel.de
Fri Aug 12 10:17:07 CEST 2011
Vitja Makarov, 12.08.2011 08:49:
> Recently I've found one more Python to C compiler, that translates
> python bytecode into C source.
> And author says about 100% Python compatibility.
That's clearly incorrect. From the code, it appears that at least the
builtins are static, which means that it's *not* Python compatible (even
less so than Cython currently is). I'm also not sure about the integer type
handling - worth some investigation, but it looks somewhat sloppy.
I mean, it's well known that you can generate fast code by diverting from
Python semantics. Shedskin excels in that. They mostly seem to be using the
same tricks that Cython uses as well, namely static builtins, as well as
optimistic optimisations for likely types based on usage patterns. And a
huge amount of special casing, maybe even more than Cython currently applies.
They also seem to make excessive use of CPython internals and internal
APIs. Not sure if that's a good idea. Specifically, they didn't care a bit
about Python 3 compatibility, it seems.
> The project is a signle 800KB python file.
That's just plain sick.
> http://code.google.com/p/2c-python/
>
> I was wondering when found that 2c beats Cython in some benchmarks.
> For instance, it's about 2 times faster than Cython in pystone test
PyStone is known to be a particularly bad benchmark.
The other benchmark results are somewhat surprising and (IMHO) hint mostly
at a lack of Python compatibility. Again, it's well known that you can make
specific benchmarks fast by diverting from Python semantics in general.
For example, Cython runs richards.py ~70% faster, whereas they claim ~90%.
Cython is ~50% faster on slowpickle, they claim ~80%. Not really that much
of a difference actually, and easily achieved by tuning the language
semantics to the code.
> Think we should investigate performance differences and make cython faster.
You could start by contacting the authors. From the project site, it
appears that it's basically Russian(?)-only.
My guess is that they simply use more special casing and slightly better
type inference than Cython currently does. Look at these, for example:
http://code.google.com/p/2c-python/source/browse/2c.py?r=23d5c350a56e21d5a3e12e153d1fbe91ae1f5d56#15583
http://code.google.com/p/2c-python/source/browse/2c.py?r=23d5c350a56e21d5a3e12e153d1fbe91ae1f5d56#15869
They infer some more return types of builtin methods and their compiler
knows about some stdlib modules (such as math):
http://code.google.com/p/2c-python/source/browse/2c.py?r=23d5c350a56e21d5a3e12e153d1fbe91ae1f5d56#578
Overriding external modules statically means diverting from Python
semantics. Cython would want to require user interaction for this, e.g. an
explicit external .pxd file.
Basically, I think that Cython could do a lot better with control flow
driven type inference. Another thing is that it would be nice to extend the
type system so that it knows about data types in Python containers.
What we should definitely do is to use Mark's fused types for
optimisations, e.g. when default arguments hint at a specific input type,
or even just when we find a function call inside the module with a specific
combination of input types.
Also, I would expect that eventually optimising the CyFunction type would
give us another bit of performance.
Stefan
More information about the cython-devel
mailing list