[Numpy-discussion] String sort

Charles R Harris charlesr.harris at gmail.com
Mon Feb 11 09:47:13 EST 2008


On Feb 11, 2008 2:58 AM, Francesc Altet <faltet at carabos.com> wrote:

> A Monday 11 February 2008, Charles R Harris escrigué:
> > On Feb 8, 2008 5:29 AM, Francesc Altet <faltet at carabos.com> wrote:
> > > Hi,
> > >
> > > I'm a bit confused that the sort method of a string character
> > > doesn't
> > >
> > > allow a mergesort:
> > > >>> s = numpy.empty(10, "S10")
> > > >>> s.sort(kind="merge")
> > >
> > > TypeError: desired sort not supported for this type
> > >
> > > However, by looking at the numpy sources, it seems that the only
> > > implemented method for sorting array strings is "merge" (I presume
> > > because it is stable).  So, perhaps the message above should be
> > > fixed.
> >
> > The only available method is the default quicksort. The mergesort is
> > for argsort and was put in for lexsort to use.
>
> That's good to know.  However, I'm curious about where it is the
> specific quicksort implementation for strings/unicode.  I've had a look
> at _sortmodule.c.src, but I can only find a quicksort implementation
> for:
>

The default is the C qsort, it is called from PyArray_Sort in
multiarraymodule.c. The type specific sorts, when they exist, are also
called from there through _new_sort. To see what type specific sorts are
registered, look at the end of _sortmodule.c.src. You can write a new sort,
and if it isn't registered it won't be used. So commenting out the
registration is a good way to get back to the default.


>
>
> The version you are testing is your own or the one that I provided?
> Here are the timings for my laptop:
>

They turned out to be essentially identical, except I used len instead of ss
;) I also used inlined functions for the copy and swap as they are safer
with the arguments. Comparison with the macro versions showed no difference
there.


>
> In [55]: a = np.random.rand(10000).astype('S8')
>
> In [56]: %timeit a.copy().sort()
> 100 loops, best of 3: 3.82 ms per loop
>
> In [57]: %timeit newqsort(a.copy())
> 100 loops, best of 3: 3.29 ms per loop
>
> Here, the difference in performance has been reduced to a mere 15%
> (still favouring newqsort).  So, with this, it seems like the
> performance of the original sorting in NumPy only suffers a lot when
> running in old processors (eg. Pentium 4), while the performance is
> reasonable with newer ones (Opteron).


It could also depend on the C library that comes with the compiler. I'm
running on a E6600, but numpy compiles everything for the i386, which might
make a difference also.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080211/34e992e9/attachment.html>


More information about the NumPy-Discussion mailing list