[Numpy-discussion] String sort

Francesc Altet faltet at carabos.com
Wed Feb 13 12:21:56 EST 2008


A Wednesday 13 February 2008, Bruce Southey escrigué:
> Hi,
> I added gcc 4.2 from the openSUSE 10.1 repository so I now have both
> the 4.1.2 and 4.2.1 compilers installed. But still have
> glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with
> 4.1.2 so I think that there could be a difference in the compiler
> flags. I don't know enough about those to help but I can test any
> suggestions.
>
> $ gcc --version
> gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux)
> $ gcc -O3 sort-string-bench.c -o sort412
> $ ./sort412
> Benchmark with 1000000 strings of size 15
> C qsort with C style compare: 0.630000
> C qsort with Python style compare: 0.640000
> NumPy newqsort: 0.360000
>
> $ gcc-4.2 --version
> gcc-4.2 (GCC) 4.2.1 (SUSE Linux)
> $ gcc-4.2 -O3 sort-string-bench.c -o sort421
> $ ./sort421
> Benchmark with 1000000 strings of size 15
> C qsort with C style compare: 0.620000
> C qsort with Python style compare: 0.610000
> NumPy newqsort: 0.550000
>
> This is  the same as:
> $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421
> $ ./sort421
> Benchmark with 1000000 strings of size 15
> C qsort with C style compare: 0.710000
> C qsort with Python style compare: 0.700000
> NumPy newqsort: 0.550000
>
> (NumPy newqsort with -O2 alone is 0.60000)
>
> For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions'
> is NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000

That's really interesting.  Let me remember my figures for our Opteron:

3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz)
C qsort with C style compare: 0.640000
C qsort with Python style compare: 0.600000
NumPy newqsort: 0.590000

Yours is running at a clock rate of 2.66 GHz and mine at 2 GHz, so that 
makes yours a 33% faster.  With this, I'd expect, for newqsort in your 
machine and using gcc 4.2.1 something like 0.59/1.33 = 0.44s.  Of 
course, this is not the case, and you are getting 0.55s, which is only 
a 7% faster.  This mostly reflects the fact that newqsort is bounded by 
other things than CPU speed (most probably, memory latency, but I might 
be wrong).

But the most important thing is that it turns out that gcc 4.2.1 is 
doing a much worse job in optimizing newqsort than gcc 4.1.2 (at least 
on Opterons).  That is unfortunate, because the similar performance of 
C qsort and newqsort on SuSe 10.3 made me think that it was due to the 
fact that SuSe speed-up the C qsort.  I see now that this is not the 
case, and the problem seems that gcc 4.2.1 is not able to optimize 
newqsort as much as 4.1.2.

So, it is becoming more and more clear that newqsort is potentially much 
faster that C qsort: you have seen a 2x of speedup, Chuck a 3x and me 
up to a 3.8x.  The only issue seems to find a good enough compiler (or 
find the correct flags) to take advantage of all of its potential, 
which doesn't seem an easy thing indeed :-/

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



More information about the NumPy-Discussion mailing list