ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler

Bearophile bearophileHUGS at lycos.com
Sun Jul 26 21:09:47 EDT 2009


William Dode':
> I updated the script (python, c and java) with your unrolled version
> + somes litle thinks.
[...]
> c 1.85s
> gcj 2.15s
> java 2.8s
> python2.5 + psyco 3.1s
> unladen-2009Q2 145s (2m45)
> python2.5 254s (4m14s)
> python3.1 300s (5m)
> ironpython1.1.1 680s (11m20)

Sorry for being late, I was away.

In your last C version this code is useless because the C compiler is
able to perform such simple optimization by itself (but probably
Python isn't able, so if you want the code to be the similar in all
versions it may be better to keep it):

shift_0=shift[0];
shift_1=shift[1];
shift_2=shift[2];
shift_3=shift[3];
shift_4=shift[4];
shift_5=shift[5];
shift_6=shift[6];
shift_7=shift[7];


This part in the Python code is useless:

shift_0 = shift[0]
shift_1 = shift[1]
shift_2 = shift[2]
shift_3 = shift[3]
shift_4 = shift[4]
shift_5 = shift[5]
shift_6 = shift[6]
shift_7 = shift[7]

Because later you copy values locally anyway:

def solve(nb, x, y,
        SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
        shift_0=shift_0,
        shift_1=shift_1,
        shift_2=shift_2,
        shift_3=shift_3,
        shift_4=shift_4,
        shift_5=shift_5,
        shift_6=shift_6,
        shift_7=shift_7,
        ):

So doing something like this is probably enough:

def solve(nb, x, y,
        SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
        shift_0=shift[0],
        shift_1=shift[1],
        shift_2=shift[2],
        shift_3=shift[3],
        shift_4=shift[4],
        shift_5=shift[5],
        shift_6=shift[6],
        shift_7=shift[7],
        ):

In low-level languages like C unrolling has to be done with care, to
avoid slowing down the code.

I have tried your latest C version using your compiler options, my
MinGW based on GCC 4.3.2 produces a crash at runtime. Using LLVM-GCC
it runs in 1.31 seconds. The D version is a bit less optimized than
your last C versions, yet using DMD it runs in 1.08-1.10 seconds.
Let's see if someone is able to write a C version faster than that D
code :-)

Have you have compiled/read my D version? In the D version you may
have missed that I did use an extra trick: unsigned integers, so it
needs just two tests to see if a number is in the 0-5, 0-5 square :-)
Note that Pyd, the Python-D bridge, may work with the latest DMD
version still (and it works if you use a bit older DMD compiler):
http://pyd.dsource.org/

Bye,
bearophile



More information about the Python-list mailing list