Sudoku solver

Mon Mar 30 06:54:52 EDT 2015

On 29/03/2015 08:57, Christian Gollwitzer wrote:
> Am 29.03.15 um 05:06 schrieb Steven D'Aprano:

[OT: Competing with compiled C code]

>> I'm not one of those people who think that C is by definition the fastest
>> language conceivable. (People who believe this sometimes make an
>> exception
>> for hand-crafted assembly, which is ironic since these days the best C
>> optimizing compilers can generate faster, tighter code than human
>> assembly
>> programmers.)

> Defeating a C compiler is possible in (rare) cases, I wouldn't really
> count the examples you've posted, since they don't actually compute
> anything useful. However, for instance the Intel C compiler is able to
> replace a loop like this:
>
> for (int i=0; i<N; i++) {
>      for (int j=0; j<N; j++) {
>          for (int k=0; k<N; k++) {
>              z[i][k]+=x[i][j]*y[j][k];
>          }
>      }
> }
>
> with a call to an optimized BLAS matrix multiplication in "some"
> (=impractical) cases.

I have an interpreter for my language which has an implementation in C, 
and one in my own static language. That latter can also optionally make 
use of a byte-code dispatcher written in assembly (which tries to deal 
with each byte-code if it can, if not it passes it on to high-level code).

This combination of HLL+ASM can be up to double the speed of the 
C/gcc/O3 version (and which has to use gcc extensions), when executing 
benchmarks. On real programs, it can still be up to 40-50% faster (ie. 
the C version takes 40-50% longer to execute).

(My own static language by itself is 30% slower than C/gcc/O3 at the 
minute, when used for the interpreter, but I'm working on it... For 
small, tight benchmarks though it is still totally outclassed by gcc.)

> Just look at the contrived PyPy benchmarks, for a very tuned selected
> sample you can be almost twice as fast as C - for a random algorithm
> like the Sudoku solver, not specifically tuned, C is 30x faster. It
> still boils down to the classic rules: static unboxed types and static
> or preallocated memory makes your code fast. In many cases, though,
> programming in Python can free programmer time, which can lead in turn
> to better algorithms that outperform the wisdom of the C compiler.

I'm quite impressed with PyPy. While its implementation seems (to me) 
fantastically complicated compared with the very simple concept of a 
byte-code interpreter, it seems to do the job. For simple integer 
benchmarks, it can be double the speed of my HLL+ASM interpreter running 
the same algorithm (but not in Python).

However I believe that CPython could be a bit faster than it is, even 
with all the dynamic stuff that is what's supposed to keep it slow.

(If Python wasn't so huge, and if I understood it a lot more than I do, 
I'd be tempted to have a go myself. For example, I could change the 
syntax of my interpreted language so that it looks like Python. Then I 
can take a simple benchmark in this 'Python', and run it with both 
CPython and my interpreter, and mine is likely to be faster. Yet, it 
still uses a byte-code dispatch loop, and it is still dynamically typed.

So what extra stuff is going on in CPython?)

-- 
Bartc