[Numpy-discussion] String sort

Charles R Harris charlesr.harris at gmail.com
Wed Feb 13 13:44:05 EST 2008


On Feb 13, 2008 10:56 AM, Francesc Altet <faltet at carabos.com> wrote:

> A Wednesday 13 February 2008, Charles R Harris escrigué:
> > OK,
> >
> > The new quicksorts are in svn. Francesc, can you check them out?
> >
>
> Looks good here.  However, you seem to keep using your own copy_string()
> instead of plain memcpy().  In previous benchmarks, I've seen that
> copy_string() is faster than memcpy only for small values of the length
> of the block to be copied.
>

Yes, I noticed that your benchmark program crossed over to using memcpy at
16 chars, and I will probably add that feature. I was being conservative to
start with.

<snip>


> Finally, you also will have noticed the indirect sort line in the plot.
> This is because I was curious about when this method would win a direct
> sort.  And, by looking at the plot, it seems that the crosspoint is
> around strings of 128 bytes (much more in fact that I initially
> thought), and starts to be very significant (around 40% faster) at 256
> bytes.  So perhaps it would make sense to add the possibility to choose
> the indirect method when sorting those large strings.  This, of course,
> would require more memory for the indices, but using 4 or 8 additional
> bytes (depending if we on 32-bit or 64-bit), when each string takes 200
> bytes, doesn't seem too crazy.  In any case, it would be nice to
> document this in docstrings.
>

It would be easy to add this feature, but for the moment I think the best
thing is to document it.

Another fairly easy change that could be made is to support strided arrays.
That might speed sorting of non-contiguous arrays and sorts on axis other
than -1. The only reason it isn't there now is that I originally wrote the
sorting routines for numarray and numarray's upper level interface passed
contiguous arrays to the sort functions.


> Be warned, I'd like to stress out that these are my figures for my _own
> laptop_.  It would be nice if you can verify all of this with other
> achitectures (your Core2 machine seems different enough).  I can run
> the benchmarks on Windows (installed in the same laptop) too.  Tell me
> if you are interested on me doing this.
>

Its easy enough to test if you compile from svn, just add your new copy
function and change the name in this line:

   #copy=copy_string, copy_ucs4#

to use your function instead of copy_string.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080213/6761ed8a/attachment.html>


More information about the NumPy-Discussion mailing list