[Tutor] Loop comparison
Stefan Behnel
stefan_ml at behnel.de
Fri Apr 16 10:52:31 CEST 2010
Alan Gauld, 16.04.2010 10:29:
> "Stefan Behnel" wrote
>> import cython
>>
>> @cython.locals(result=cython.longlong, i=cython.longlong)
>> def add():
>> result = 0
>> for i in xrange(1000000000):
>> result += i
>> return result
>>
>> print add()
>>
>> This runs in less than half a second on my machine, including the time
>> to launch the CPython interpreter. I doubt that the JVM can even start
>> up in that time.
>
> I'm astonished at these results. What kind of C are you using. Even in
> assembler I'd expect the loop/sum to take at least 3s
> on a quad core 3GHz box.
>
> Or is cython doing the precalculation optimisations you mentioned?
Nothing surprising in the C code:
__pyx_v_result = 0;
for (__pyx_t_1 = 0; __pyx_t_1 < 1000000000; __pyx_t_1+=1) {
__pyx_v_i = __pyx_t_1;
__pyx_v_result += __pyx_v_i;
}
> And if so when does it do them? Because surely, at some stage, it still
> has to crank the numbers?
Cython does a bit of constant folding (which it can benefit from on
internal optimisation decisions), but apart from that, the mantra is to
just show the C compiler explicit C code that it can understand well, and
let it do its job.
> (We can of course do some fancy math to speed this particular sum up
> since the result for any power of ten has a common pattern, but I
> wouldn't expect the compiler optimiser to be that clever)
In this particular case, the C compiler really stores the end result in the
binary module, so I assume that it simply applies Little Gauß as an
optimisation in combination with loop variable aliasing.
Stefan
More information about the Tutor
mailing list