[Numpy-discussion] searchsorted() and memory cache

Stéfan van der Walt stefan at sun.ac.za
Sat May 10 11:31:39 EDT 2008


Hi Andrew

2008/5/9 Andrew Straw <strawman at astraw.com>:
> I've got a big element array (25 million int64s) that searchsorted()
> takes a long time to grind through. After a bit of digging in the
> literature and the numpy source code, I believe that searchsorted() is
> implementing a classic binary search, which is pretty bad in terms of
> cache misses. There are several modern implementations of binary search
> which arrange items in memory such that cache misses are much more rare.
> Clearly making such an indexing arrangement would take time, but in my
> particular case, I can spare the time to create an index if searching
> was faster, since I'd make the index once but do the searching many times.
>
> Is there an implementation of such an algorithm that works easilty with
> numpy? Also, can you offer any advice, suggestions, and comments to me
> if I attempted to implement such an algorithm?

If found Francesc Altet's Pyrex implementation at

http://mail.python.org/pipermail/python-list/2007-November/466503.html

I modified it for use with Cython and added some tests:

https://code.launchpad.net/~stefanv/+junk/my_bisect

That may be a good starting point for further experimentation.  As it
is, it is already about 10 times faster than the built-in version
(since I can assume we're working with int64's, so no special type
checking is done).

Regards
Stéfan



More information about the NumPy-Discussion mailing list