[Numpy-discussion] String sort

Francesc Altet faltet at carabos.com
Thu Feb 14 13:44:55 EST 2008


A Thursday 14 February 2008, Charles R Harris escrigué:
> On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet <faltet at carabos.com> 
wrote:
> > Looking forward to see the new qsort for strings in NumPy (the
> > specific version for merge sort is very welcome too!).
>
> I could never figure out what the merge sort was good for. I did the
> indirect version in numarray because I needed a stable sort to
> implement lexsort, which was my original aim. I just added the direct
> version for completeness. If you have a use for it, I would love to
> hear it.

Well, I must to confess that I've not used merge sorts yet, but I'd like 
to test them in the context of my PSI (Partially Sorted Indexes, see 
[1] for a white paper on a concrete implementation) work.  My hope is 
that, as a merge sort keeps the order of indices of elements that are 
equal (this is what 'stable' means), this would allow better 
compression rates for indices (and hence, less I/O effort to bring the 
indices from disk into memory and ultimately allowing for faster lookup 
speed).  This will probably be only important when one have data 
distributions with rather low cardinality, but these scenarios can be 
more frequent/important than one may think.

[1] http://www.carabos.com/docs/OPSI-indexes.pdf

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



More information about the NumPy-Discussion mailing list