[Numpy-discussion] String sort

Sat Feb 9 08:40:56 EST 2008

A Friday 08 February 2008, Charles R Harris escrigué:
> On Feb 8, 2008 8:58 AM, Francesc Altet <faltet at carabos.com> wrote:
> > A Friday 08 February 2008, Charles R Harris escrigué:
> > > > Also, in the context of my work in indexing, and because of the
> > > > slowness of the current implementation in NumPy, I've ended
> > > > with an implementation of the quicksort method for 1-D array
> > > > strings.  For moderately large arrays, it is about 2.5x-3x
> > > > faster than the (supposedly) mergesort version in NumPy, not
> > > > only due to the quicksort, but also because I've implemented a
> > > > couple of macros for efficient string swapping and copy.  If
> > > > this is of interest for NumPy developers, tell me and I will
> > > > provide the code.
> > >
> > > I have some code for this too and was going to merge it. Send
> > > yours along and I'll get to it this weekend.
> >
> > Ok, great.  I'm attaching it.  Tell me if you need some
> > clarification on the code.
>
> I ran a few timing tests. On my machine strncmp is about 100x faster
> than opt_strncmp, but  sSWAP (with some fixes), is about 10x faster
> then using the memcpy in a recent compiler. Does this match with your
> experience.

Well, I've run some more exhaustive tests on my laptop (Pentium 4 @ 2 
GHz, Ubuntu 7.10, gcc 4.1.3, using -O3 optlevel) with the next sa1 
array:

numpy.random.seed(1)
nelem = 10000
a = numpy.random.rand(nelem)
sa1 = a.astype('S16')

And I've chosen the next benchmark:

/* start:  the start of data for sa1 array
   ss:  the length of the string type (16)
   num: the number of elements in sa1 (10000)    
 */
int sort_S(char *start, int ss, npy_intp num)
{
  char *pl = start;
  int a = 0;
  npy_intp i, j;

  for (i=0; i<num; i++)
    for (j=0; j<num; j++)
      a += opt_strncmp(pl+i*ss, pl+j*ss, ss);

  return a;
}

So, when I choose the next implementation of opt_strncmp:

static int inline opt_strncmp1(char *a, char *b, intp n) {
  intp i;
  for (i=0; i<n; i++) {
     if (a[i] > b[i]) return i+1;
     if (a[i] < b[i]) return -(i+1);
  }
  return 0;
}

I get a time of 1.70 s.  When using the next implementation:

static int inline opt_strncmp2(char *a, char *b, intp n) {
  intp i;
  for (i=0; i<n; i++) {
    if (a[i] != b[i])
      return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]);
  }
  return 0;
}

I get a time of 1.24 s.  The original strncmp gives me 2.05 seconds with 
this setup.  So, in short and using my laptop:

original strncmp: 2.05 s
opt_stncmp1:      1.70 s ; speed-up: 20%
opt_stncmp2:      1.24 s ; speed-up: 65%

Just in case, I've also run the benchmark in an Opteron server (2 GHz, 
SuSe 10.1, gcc 4.2.1, using -O3 optlevel).  Here are the results:

original strncmp: 1.25 s
opt_stncmp1:      1.61 s ; speed-up: -30%
opt_stncmp2:      1.03 s ; speed-up:  20%

So, I was wrong: opt_stncmp2 is quite more efficient than opt_stncmp1, 
and definitely better than the original strncmp.  So, I would say that 
using opt_stncmp2 is always the best bet.  I'm not certain why are you 
seeing that original strncmp is 100x faster than opt_strncmpX :-/

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"