[Numpy-discussion] String sort

Charles R Harris charlesr.harris at gmail.com
Sat Feb 9 16:55:52 EST 2008


On Feb 9, 2008 2:42 PM, Charles R Harris <charlesr.harris at gmail.com> wrote:

>
>
> On Feb 9, 2008 2:29 PM, Francesc Altet <faltet at carabos.com> wrote:
>
> > Chuck,
> >
> > One more thing on this.  I've been doing some benchmarking with my
> > opt_memcpy() macro in the quicksort_string function, and I should say
> > that while it is definitely more efficient than my system memcpy for
> > small values of n (the number of bytes to copy), this doesn't keep true
> > for all values of n.  For example, for n<16, opt_memcpy() can be more
> > than 4x faster than system memcpy (and this is why I naively thought
> > that it would be faster in general).  However, for n>80, memcpy beats
> > opt_memcpy between a 25% and 100% (depending on whether n is divisible
> > by 2, 4 or 8).  This is on my Linux system (Ubuntu 7.10), but perhaps
> > with Windows the behaviour can be different.
> >
> > I think I would be able to come up with a routine that can offer a
> > balance between opt_memcpy and system memcpy, but that should take some
> > time.  So, until I (or anybody else) do more research on this, I think
> > it would be safer if you use system memcpy for string sorting in NumPy.
> >
>
> The memcpy in newer compilers is actually pretty good. For integers and
> such it sometime compiles inline using integer assignments, but I was loath
> to make it the default implementation until >= 4.1.x gcc became more
> common. However, strings might be a good place to use it.
>

I'm also thinking that at some point it becomes more efficient to do a
indirect sort followed by take than to move all those big strings around.
But I guess we won't know where that point is until we have both versions
available.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080209/3cb5c7e6/attachment.html>


More information about the NumPy-Discussion mailing list