[Numpy-discussion] String sort
Francesc Altet
faltet at carabos.com
Wed Feb 13 12:21:56 EST 2008
A Wednesday 13 February 2008, Bruce Southey escrigué:
> Hi,
> I added gcc 4.2 from the openSUSE 10.1 repository so I now have both
> the 4.1.2 and 4.2.1 compilers installed. But still have
> glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with
> 4.1.2 so I think that there could be a difference in the compiler
> flags. I don't know enough about those to help but I can test any
> suggestions.
>
> $ gcc --version
> gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux)
> $ gcc -O3 sort-string-bench.c -o sort412
> $ ./sort412
> Benchmark with 1000000 strings of size 15
> C qsort with C style compare: 0.630000
> C qsort with Python style compare: 0.640000
> NumPy newqsort: 0.360000
>
> $ gcc-4.2 --version
> gcc-4.2 (GCC) 4.2.1 (SUSE Linux)
> $ gcc-4.2 -O3 sort-string-bench.c -o sort421
> $ ./sort421
> Benchmark with 1000000 strings of size 15
> C qsort with C style compare: 0.620000
> C qsort with Python style compare: 0.610000
> NumPy newqsort: 0.550000
>
> This is the same as:
> $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421
> $ ./sort421
> Benchmark with 1000000 strings of size 15
> C qsort with C style compare: 0.710000
> C qsort with Python style compare: 0.700000
> NumPy newqsort: 0.550000
>
> (NumPy newqsort with -O2 alone is 0.60000)
>
> For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions'
> is NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000
That's really interesting. Let me remember my figures for our Opteron:
3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz)
C qsort with C style compare: 0.640000
C qsort with Python style compare: 0.600000
NumPy newqsort: 0.590000
Yours is running at a clock rate of 2.66 GHz and mine at 2 GHz, so that
makes yours a 33% faster. With this, I'd expect, for newqsort in your
machine and using gcc 4.2.1 something like 0.59/1.33 = 0.44s. Of
course, this is not the case, and you are getting 0.55s, which is only
a 7% faster. This mostly reflects the fact that newqsort is bounded by
other things than CPU speed (most probably, memory latency, but I might
be wrong).
But the most important thing is that it turns out that gcc 4.2.1 is
doing a much worse job in optimizing newqsort than gcc 4.1.2 (at least
on Opterons). That is unfortunate, because the similar performance of
C qsort and newqsort on SuSe 10.3 made me think that it was due to the
fact that SuSe speed-up the C qsort. I see now that this is not the
case, and the problem seems that gcc 4.2.1 is not able to optimize
newqsort as much as 4.1.2.
So, it is becoming more and more clear that newqsort is potentially much
faster that C qsort: you have seen a 2x of speedup, Chuck a 3x and me
up to a 3.8x. The only issue seems to find a good enough compiler (or
find the correct flags) to take advantage of all of its potential,
which doesn't seem an easy thing indeed :-/
Cheers,
--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"
More information about the NumPy-Discussion
mailing list