[Numpy-discussion] arange(start, stop, step) and floating point (Ticket #8)
Travis Oliphant
oliphant.travis at ieee.org
Thu Feb 9 22:28:02 EST 2006
Sasha wrote:
>Well, my results are different.
>
>SVN r2087:
>
>
>>python -m timeit -s "from numpy import arange" "arange(10000.0)"
>>
>>
>10000 loops, best of 3: 21.1 usec per loop
>
>SVN r2088:
>
>
>>python -m timeit -s "from numpy import arange" "arange(10000.0)"
>>
>>
>10000 loops, best of 3: 25.6 usec per loop
>
>I am using gcc version 3.3.4 with the following flags: -msse2
>-mfpmath=sse -fno-strict-aliasing -DNDEBUG -g -O3.
>
>The timing is consistent with the change in the DOUBLE_fill loop:
>
>r2087:
> 1b8f0: f2 0f 11 08 movsd %xmm1,(%eax)
> 1b8f4: f2 0f 58 ca addsd %xmm2,%xmm1
> 1b8f8: 83 c0 08 add $0x8,%eax
> 1b8fb: 39 c8 cmp %ecx,%eax
> 1b8fd: 72 f1 jb 1b8f0 <DOUBLE_fill+0x30>
>
>r2088:
> 1b9d0: f2 0f 2a c2 cvtsi2sd %edx,%xmm0
> 1b9d4: 42 inc %edx
> 1b9d5: f2 0f 59 c1 mulsd %xmm1,%xmm0
> 1b9d9: f2 0f 58 c2 addsd %xmm2,%xmm0
> 1b9dd: f2 0f 11 00 movsd %xmm0,(%eax)
> 1b9e1: 83 c0 08 add $0x8,%eax
> 1b9e4: 39 ca cmp %ecx,%edx
> 1b9e6: 7c e8 jl 1b9d0 <DOUBLE_fill+0x20>
>
>
>
Nice to see some real hacking on this list :-)
>My change may be worth commiting because C code is shorter and
>arguably more understandable (at least by Fortran addicts :-).
>Travis?
>
>
Yes, I think it's worth submitting. Most of the suggestions for
pointer-arithmetic for fast C-code were developed when processors spent
more time computing than fetching memory. Now it seem it's all about
fetching memory intelligently.
The buffer[i]=
style is even recommended according to the AMD-optimization book Sasha
pointed out.
So, I say go ahead unless somebody can point out something we are missing...
-Travis
More information about the NumPy-Discussion
mailing list